Professional Documents
Culture Documents
School of Engineering
Department of Computer and Communication Engineering
Fall 2015 – 2016
Instructors: Dr. Zaher Merhi, Dr. Ali Ghouwayel, Dr. Ayman Khalil, Dr. Ali Bazzi, Dr.
Abdelmehsen Ahmad, Dr. Ousama Tahan, Dr. Reda Shbib
There are three Questions in the booklet each has several parts, please answer all parts of the 3
questions to the best of your ability.
Marking Scheme:
Page 1 of 10
Question 1: Single Cycle Data Path (20 points)
a) Consider the data path below for a single cycle 32-bits MIPS processor
Assume that we are executing the following instruction
4
2
1
$t0 = 0x00018FCA
3
$t1 = 0x000076FC
$t2 = 0x0000A5B0
PC = 0x000016F0
(Before executing add)
Page 2 of 10
a) Fill the table below with the content of the lines indicated by the number on the figure (12
Points). Write down all bits in binary or Hex.
1
2
3
4
5
6
b) For each of the following instructions fill the respective tables (8 points)
add
RegDst Branch
MemRead MemtoReg
ALUOp MemWrite
ALUSrc RegWrite
sw
RegDst Branch
MemRead MemtoReg
ALUOp MemWrite
ALUSrc RegWrite
slt
RegDst Branch
MemRead MemtoReg
ALUOp MemWrite
ALUSrc RegWrite
beq
RegDst Branch
MemRead MemtoReg
ALUOp MemWrite
ALUSrc RegWrite
Page 3 of 10
Question 2: MIPS Pipeline (40 points)
a) Consider the following MIPS program that was run on 5 stage MIPS pipeline processor
lw $t0, 0($s2)
sw $t0,0($s1)
add $t0, $t0, $t1
sw $t0, 4($s1)
addi $s1,$s1,-4
i. Assume that there is no forwarding employed, fill the table below. Indicate Stalls by
writing ST in the Clock cycle (C) where it occurs. (10 points)
Clock cycle
Instruction C1 C2 C3 C4 C5 C6 C7 C8 C9 C C C 12 C 13 C 14 C 15 C 16 C 17 C 18
10 11
Page 4 of 10
ii. Assume Now that forwarding is employed, fill the table below. Indicate Stalls by writing
ST in the Clock cycle (C) where it occurs and Forwarding by an arrow() (12 points)
Clock cycle
Instruction C1 C2 C3 C4 C5 C6 C7 C8 C9 C C C 12 C 13 C 14 C 15 C 16 C 17 C 18
10 11
Page 5 of 10
b) Consider the following MIPS program:
i. Assume that the above program is run on a single issue 5 stage pipeline
processor where forwarding and Stalls are employed, Calculate the CPI( clock
per instruction). (6 points) Detail your answer
ii. Assume that the above program is run on a two issue (VLIW) pipeline processor
where forwarding and Stalls are employed but NO loop unrolling is allowed,
calculate the CPI. Detail your answer. It is also assumed that in the same 2-issue
pack, Data transfer instruction can be only combined with ALU or branch
instruction ( 6 points)
Note: the number of rows does not relate to the correct solution
Page 6 of 10
iii. Assume that the above program is run on a two issue (VLIW) pipeline processor
where forwarding and Stalls are employed but loop unrolling is now allowed,
however, your are only allowed to unroll it twice. Calculate the CPI. Detail your
answer. It is also assumed that in the same 2-issue pack, Data transfer instruction
can be only combined with ALU or branch instruction ( 6 points)
Note: the number of rows does not relate to the correct solution
Page 7 of 10
Question 3: Memory Organization and Design (40 points)
a) Consider a Direct Mapped cache with 32-bit memory address reference word addressable.
Assume a 2 word block and a Cache size of 8 locations.
i. What is the number of tags bits and index bits (2 points)
ii. Starting with an empty cache, assume the following memory address references (word
addressable) in the corresponding order
Address 3 180 43 2 191 88 190 14 181 44 186 253
Address 3 180 43 2 191 88 190 14 181 44 186 253
Block
number
Hit/Miss
2. List the final state if the cache. List only valid entries (containing data
by filling the following table. Where the data field corresponds to
MEM[address] for example: MEM[16]. ( 4 points)
Page 8 of 10
b) A processor runs at 2 GHz and has a CPI of 1.2 without including the stall cycles due to
cache misses. Load and store instructions count 30% of all instructions. The processor has an
I-cache and a D-cache. The hit time is 1 clock cycle. The I-cache has a 2% miss rate. The D-
cache has a 5% miss rate on load and store instructions. The miss penalty is 50 ns, which is
the time to access and transfer a cache block between main memory and the processor. (18
points)
i) What is the average memory access time for instruction access in clock cycles?
(4 points)
ii) What is the average memory access time for data access in clock cycles? (2
points)
iii) Given the CPI is 1.2 with no stalls ( i.e. a perfect cache) Calculate the new CPI
taking into consideration stalls occurring from I-cache miss and D-cache miss? (4
points)
iv) You are considering replacing the 2 GHz CPU with one that runs at 4 GHz, but is
otherwise identical. How much faster does the new processor run? (8 points)
Assume that hit time in the I-cache and the D-cache is 1 clock cycle in the new
processor, and the time to access and transfer a cache block between main memory
and the processor is still 50 ns.
Page 9 of 10
c) Assume you have a cache with 16K blocks and each block contains 4 words and a 32-bit byte
address. Find the total number of sets and the total number of Tag bits needed for the entire
cache if the cache was: ( 6 points)
i) 2-way set associative
Page 10 of 10