Calculating Fibonacci Numbers Using Pipelined Processor

Question 1.
The function F is deﬁned as F (1) = F (2) = F (3) = 1 and for n ≥ 3,
F (n + 1) = F (n) · (F (n − 1) + F (n − 2))
i.e., the (n + 1)-th value is given by the product of the n-th value and the sum of the
(n − 1)-th and (n − 2)-th values.
(a) Write an assembly program for computing the k-th value F (k), where k is an
integer bigger than 3 read from a memory location M, and storing F (k) at memory
location M. Use the instruction set in the Instruction Set Architecture document
(b) Consider a pipelined processor, where the pipeline stages are IF (instruction
fetch), ID (instruction decode), RR (register read), EX (execute instruction), WB
(write back result into register). Describe what happens in the pipeline stages for
the various types (data movement, data processing, control) of instructions.
(c) Show the execution of your program on the above pipelined processor for k = 5
by drawing a diagram. Assume that the fetched and decoded instructions are stored
in an instruction window IW with unlimited capacity (you can store any number of
instructions in the IW), and that there is no resource conflict between fetching
instructions and executing data transfer instructions. Explain where and why delay
slots appear.
Suggested solution for question 1.
Part (a).
In assembly language, the programmer needs to break down the task at hand to
the smallest steps possible, to the level that they can be serviced by the resources
available in the processor and the instructions provided.
Consider the resources that are required to evaluate the given expression:
1. Registers to hold the (n + 1)-th, (n)-th, (n − 1)-th and (n − 2)-th values of the
expression => 4 registers are needed. Perhaps one more register will be
needed as an intermediate register keeping intermediate results.
2. A loop will be used to calculate the values held in the registers. The loop
needs to be repeated as many times as required until the value of the control
variable will be equal to 3, starting from the value set by the user in memory.
=> 1 register is needed for the control variable
3. The result is returned to memory. No more resources are needed.
Consider the steps required to evaluate the given expression:

1. evaluate the parenthesis by adding F(n-1) with F(n-2)
2. multiply the result with F(n)
3. store F(n-1) to F(n-2), F(n) to F(n-1)
Consider the steps that the programme needs to apply to satisfy the requirement
1. setup the registers

2. evaluate the expression
3. repeat the evaluation until n = 3
4. store the result in memory
Implement the above steps:
I1 LOAD r0, M Register r0 will hold the control variable, now is loaded
with the content of memory location M
I2 LOAD r1, #1 Register r1 will hold F(n-1), now is loaded with 1
I3 LOAD r2, #1 Register r2 will hold F(n-2), now is loaded with 1
I4 LOAD r3, #1 Register r3 will hold F(n), now is loaded with 1
All four registers are now ready to be used to calculate the value of the above formula until
r0 = 3.
I5 SUB r0, r0, #1 Decrement the control variable by one as we are about t
o execute the loop for one (more) time.
The following four instructions calculate the current value of F:
I6 ADD r4, r1, r2 Add the contents of r2 F(n-2) to those of r1 F(n-1) and
store the result in an intermediate register r4
I7 LOAD r1, r2 Move the contents of the register holding F(n-1) to the
register holding F(n-2)
I8 LOAD r2, r3 Move the contents of the register holding F(n) to the
register holding F(n-1)
I9 MUL r3, r3, r4 Multiply the content of the register holding the
intermediate result of the addition to the register holding
the current F(n). The register now holds F(n+1)
The following instruction tests whether the terminal condition holds true
I10 BNE I5, r0, #3 If the control variable held in r0 does not equal 3, branch
to I5 to repeat the loop
If the control variable held in r0 equals 3, proceed to the

instruction following this one.
I11 STOR M, r3 Store the result in memory.
Part (b).
The above programme consists of a number of types of instructions:
LOAD, MUL, ADD, BNE, STOR.
All the instructions goes through IF and ID. For each type, describe what happens if and
when they go through each of the other stages of the pipeline. This is a simple description
that comes straight out of the lecture notes.
Part (c).
Cycle IF ID IW RR EX WB Comments
nr.
1 I1 I1 fetched
2 I2 I1 I2 fetched, I1 decoded
I3 fetched, I2 decoded, I1 executed, reading data
3 I3 I2 I1
from memory.
I4 fetched, I3 decoded, I1 writes result to the
4 I4 I3 I2 I1
register r0.
I5 fetched, I4 decoded, I3 is placed in IW because
5 I5 I4 I3 I2
WB is busy as I2 writes result to the register r1.
6 I6 I5 I4 I3 I6 fetched, I5 decoded, I4 is placed in IW because
WB is busy as I3 writes result to the register r2.
I7 fetched, I6 decoded, I5 reads registers, I4
7 I7 I6 I5 I4
writes result to the register r3
8 I8 I7 I6 I5
executed subtracting 1 from r0.
9 I9 I8 I7 I6 I5
executed adding r2 to r1, I5 writes result to r0
I10 fetched, I9 decoded, I8 enters IW because I7
10 I10 I9 I8 I7 I6 cannot move from RR to WB as I6 is in WB,
I6 writes result to r4
11 I11 I10 I9 I8 I7
is in RR, I8 reads registers, I7 writes result to r1
I11 decoded, I10 enters IW because I9 is in RR,
12 I11 I10 I9 I8
I11 enters IW because I10 is in RR.
13 I11 I10 I9 I10 reads registers, I9 executed performing the
multiplication of r4 with r3.
I11 reads register r3, I10 is executed, comparing
14 I11 I10 I9
r0 to 3, I9 writes result to r3
Because the condition evaluated to true (r0 not
15 I10 equal to 3), I11 is discarded and I10 writes PC
with the address of I5
16 I5 I5 fetched
17 I6 I5 I6 fetched, I5 decoded
18 I7 I6 I5 I7 fetched, I6 decoded, I5 reads registers
19 I8 I7 I6 I5
executed subtracting 1 from r0.
20 I9 I8 I7 I6 I5
executed adding r2 to r1, I5 writes result to r0
21 I10 I9 I8 I7 I6 cannot move from RR to WB as I6 is in WB,
22 I11 I10 I9 I8 I7
is in RR, I8 reads registers, I7 writes result to r1
I11 decoded, I10 enters IW because I9 is in RR,
23 I11 I10 I9 I8
I11 enters IW because I10 is in RR.
24 I11 I10 I9 I10 reads registers, I9 executed performing the
multiplication of r4 with r3.
I11 reads register r3, I10 is executed comparing
25 I11 I10 I9 r0 to 3 (terminal condition is now true), I9 writes
the result to register r3
I11 is executed, writing the result to the memory
26 I11
location.
Conflicts:
In cycles 4, 5, 6 and then in cycles 21, 22, 23 instructions I1 & I2, I2 & I3, I3 & I4,
respectively need the WB stage.
In cycles 10,11 and 12, instructions I7 & I6, I7 & I8, I8 & I9, respectively need the WB
stage.
In cycle 14, until I10 concludes its comparison, it is not known whether I5, or I11 will be
executed.
Question 2.
A computer has a cache, main memory and a hard disk. If a referenced word is in the
cache, it takes 7 ns to access it. If it is in main memory but not in the cache, it takes 55 ns
to load the block containing it into the cache (this includes the time to originally check the
cache), and then the reference lookup is started again. If the word is not in main memory,
it takes 10 ms to load (the block containing) it from the disk into main memory, and then
the reference lookup is started again. The cache hit ratio is 0.25. In the case of a cache
miss, the probability that the word is in the main memory is 0.6. Compute the average
load time.
Suggested solution for question 2.

Step 1:
Consider what “cache hit ratio is 0.25” means:
For every 100 instructions fetched, 25 of them have their operands in the cache and need
7 ns each to be accessed. The remaining 75, have their operands either in the main
memory, or on the hard drive. Therefore, the system has to copy them first onto the cache
memory and then execute them.
Consider what “the probability that the word is in the main memory is 0.6” means:
For every 100 instructions fetched, 60 of them are located in main memory and 40 are
located on the hard drive.
Those that are located in main memory, they must first be copied onto cache and then
executed. Therefore, the system first copies them onto cache (55 ns) and then access
them (7 ns). The total time required to access them: 55 ns + 7 ns = 62 ns
Those that are located on the hard drive, they must first be copied from the hard drive onto
the main memory (10 ms, or 107 ns) and then copied onto cache (55 ns) and then
accessed (7 ns). The total time required to access them:
107 ns + 55 ns + 7 ns = 107 ns + 62 ns
Step 2:
Calculate how many instructions are fetched from each location:
25 instructions from cache
75 * 0.6 = 45 instructions from main memory and
75 * 0.4 = 30 instructions from the hard drive
Step 3:
Calculate the total time needed for all type of instructions:
25 * 7 ns + 45 * 62 ns + 30 * (107 + 62) ns = 175 ns + 2790 ns + 3 * 108 ns + 1860 ns
= 4825 + 300,000,000
= 300,004,825 ns
~ 300 msec
Step 4:
Averaging for 100 instruction, we divide the sum by 100 and find that the average time in
milliseconds required to access an instruction is: 3 ms

Calculating Fibonacci Numbers Using Pipelined Processor

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Calculating Fibonacci Numbers Using Pipelined Processor

Uploaded by

Copyright:

Available Formats

Question 1.

The function F is deﬁned as F (1) = F (2) = F (3) = 1 and for n ≥ 3,

Suggested solution for question 1.

3. The result is returned to memory. No more resources are needed.

Consider the steps required to evaluate the given expression:

1. setup the registers

Implement the above steps:

The following four instructions calculate the current value of F:

If the control variable held in r0 equals 3, proceed to the

I11 STOR M, r3 Store the result in memory.

Suggested solution for question 2.

You might also like