You are on page 1of 5

CS220: Computer Organization CS220: Computer Organization

Quiz#1 Quiz#2
Name: Name:
Roll No.: Roll No.:

1. A four-bit adder computing A+B where both A and B are four bits wide is implemented using a read-only 1. Consider a single-channel DIMM card with a 128-bit channel. The channel has two ranks. The DIMM card
memory (ROM). The output of the computation is the sum and the most significant position’s carry. Calculate the uses x8 chips and each chip has eight banks. If each bank has 16384 rows and 2048 columns, what is the total
number of rows and the width of the ROM. (1.5+1.5 points) capacity of the DIMM card in bits? (2 points)

Solution: We need to store the outputs for all possible inputs. The total input size is eight bits. So, the ROM needs Solution: Since each chip provides 8-bit output, number of chips in a rank = channel width/output width = 128/8
28 or 256 rows. There are five bits of output (four bits of sum and one bit of carry) to be stored for each input. So, = 16. Total DIMM capacity = bits per column x columns per row x rows per bank x banks per chip x chips per
each row needs to be five bits wide. rank x ranks per channel x channels per DIMM = 8 x 2048 x 16384 x 8 x 16 x 2 x 1 bits = 236 bits = 64 Gb.

2. A register file has 32 registers each of width 64 bits. The register file has a read port and a write port. For each 2. Suppose the following five access requests have come to a particular bank of a DRAM module, where each
port, there is a decoder which drives the wordlines. The read data appears on the bitlines and the data to be written request is listed as (row number, column number): (10, 34), (4, 34), (8, 2), (10, 23), (8, 23). Show the sequence
must be launched on the bitlines. The ports are designed in such a way that a read operation and a write operation in which the requsts must be sent to the bank such that it takes the minimum amount of time to complete all the
can be done at the same time. Calculate the number of input lines to each decoder, the total number of wordlines, requests. (2 points)
and the total number of bitlines in the register file. (1+2+2 points)
Solution: Accesses to the same row must be clubbed together. One possible sequence: (10, 34), (10, 23), (4, 34),
Solution: There are 32 registers. So, the number of input lines to each decoder is log of that i.e., 5. Each port (8, 2), (8, 23).
requires a wordline for each row. So, there are 64 wordlines. If you have assumed that a read and a write cannot be
done simultaneously on the same register, then the read port can share its wordline with the write port (you need 3. Write down the two’s complement binary representation of the decimal integer -59 assuming that eleven bits
to OR the corresponding output wires from the two decoders to generate the wordline signal). In this case, the are used for the representation. (2 points)
total number of wordlines would be 32. I have accepted this answer, but this is not the correct way of designing a
register file. A register file designer leaves open the option of reading from and writing to a register simultaneously. Solution: First, we represent 59 using eleven bits: 00000111011. Next, we invert each bit and add one to the result
It is up to the environment to decide how to use the register file correctly. Each port requires a bitline per column. to get the final answer: 11111000101.
The read and write ports cannot share bitlines. So, the total number of bitlines is 128.
4. Write down the binary representation of the decimal fraction 9.3125 in normalized scientific notation. (4 points)
3. Consider a combinational logic function f that takes input x and produces output y. A positive edge-triggered
flip-flop provides the input x and another positive edge-triggered flip-flop stores the output y. The hold time and the Solution: 9.3125 = 1001.0101 = 1.0010101 x 23 .
setup time of the flip-flops are 10 picoseconds and 200 picoseconds, respectively. The time required to compute f
is 1200 picoseconds. The propagation delay through each flip-flop is 300 picoseconds. The maximum clock skew
is 50 picoseconds. Calculate the minimum clock cycle time for the design to work correctly. (2 points)

Solution: Minimum clock cycle time should accommodate the propagation delay through the input flip-flop, the
time to compute f, the setup time of the output flip-flop, and the skew time. So the minimum clock cycle time is
(300+1200+200+50) picoseconds or 1750 picoseconds. If you have included the hold time in the calculation and
everything else is correct, I have deducted one point.

1 1
0x10000004 is 0xabcdef12. What is the final hexadecimal value of the word stored at address 0x10000004? (3
points)

lb $t0, 3($t0)
CS220: Computer Organization sh $t0, 0($t1)
Quiz#3
Name: Solution: Since MIPS is big-endian, the lb instruction loads 0xba into $t0 and sign-extends it. So, $t0 has
Roll No.: 0xffffffba after lb instruction. The sh instruction stores the least significant half-word from $t0 into addresses
0x10000004 and 0x10000005. So, the final value of the word at address 0x10000004 is 0xffbaef12.

General instructions: In all the questions, you will assume 32-bit big-endian MIPS ISA. Grading policy: No partial marks.

1. Consider translating the following for loop into MIPS. Assume that i and N allocated in registers $t0 and $t1, 4. Consider translating the following C statement where the value of label is 0x0 and label1 is 228 instructions
respectively. away. This information is available at the time of compilation of the statement. Show the MIPS translation of this
C statement using minimum number of instructions. (3 points)
for (i=0; i<N; i++) { loop body }
The skeleton of the MIPS translation is shown below. The bne instruction starts at address 0x00501000. What is label: goto label1
the minimum possible address of the first instruction of the loop situated at the label start? Express your answer
in hexadecimal. (2 points) Solution: The value of label1 is 230 i.e., 0x40000000. The instruction sequence is shown below.
...
lui $at, 0x4000
start: ...
jr $at
loop body
...
slt $t2, $t0, $t1 Grading policy: Any extra instruction or wrong immediate operand in the lui instruction will have a penalty of
bne $t2, $0, start one mark, provided everything else is correct. If the instruction sequence is completely different, no partial marks.

Solution: The negative value with largest magnitude for the PC-relative offset of the bne instruction is 0x8000.
After sign-extension and shifting left by two bit positions, we get 0xfffe0000. The minimum value of the la-
bel start is obtained by adding this to 0x00501000. Therefore, the minimum value of the label start is
0x004e1000.

Grading policy: No partial marks.

2. Consider the following MIPS instruction sequence. What is the final hexadecimal value in $t0? (2 points)
lui $t0, 0x42
addi $t0, $t0, 0xabcd
sra $t0, $t0, 0x3

Solution: After the lui instruction, $t0 has 0x00420000. The immediate operand of the addi instruction is
sign-extended and added to $t0. After the addi instruction, $t0 has 0x0041abcd. When this is shifted to right by
three bit positions, we get 0x00083579.

Grading policy: One mark if the contents of $t0 are correct after the addi instruction. One more mark for correct
execution of the shift instruction.

3. Consider the following MIPS instruction sequence. Assume that initially $t0 contains 0x10000000 and $t1 con-
tains 0x10000004. Initially, the word stored at address 0x10000000 is 0x12fedcba and the word stored at address

1 2
4. Suppose 1001010 (in binary) is divided by 1001 (in binary). We would like to calculate the number of additions
and subtractions when using the restoring, non-performing restoring, and non-restoring algorithms. Fill out the
CS220: Computer Organization six entries in the table below. (0.5×6 points)

Quiz#4
Table 1. Count of additions and subtractions in division algorithms
Name: Algorithm Number of additions Number of subtractions
Roll No.: Restoring 3 4

1. Suppose Booth’s algorithm is used in a multiplication where the multiplicand and the multiplier are represented Non-performing restoring 0 4
in two’s complement and their respective values are 0xabcdef01 and 0xcdef01ab. Count the number of addition
and subtraction operations. (1+1 points)
Non-restoring 3 2
Solution: The number of addition and subtraction operations depends on the multiplier only. The multiplier is
1100 1101 1110 1111 0000 0001 1010 1011 0. I have also appended a zero on the least significant side as required
by Booth’s algorithm. The number of additions is equal to the number of transitions from 1 to 0 and the number
of subtractions is equal to the number of transitions from 0 to 1 while scanning the multiplier from right to left.
So, the number of additions is 7 and the number of subtractions is 8.
Solution: The quotient is 1000 and the remainder is 10. The filled table is shown above. The general rule in
2. Consider a program that has 15% load/store instructions, 25% conditional branch instructions, 10% other types restoring division is that the number of subtractions is equal to the number of iterations and the number of additions
of control transfer instructions, and 50% arithmetic and logic instructions. The program is executed on a processor is equal to the number of zeros in the quotient. For non-performing restoring division, the number of additions is
with average CPI of load/store 10, of conditional branch 4, of other types of control transfer instructions 3, and zero and the number of subtractions is equal to the number of iterations. I have shown the steps involved in the
of arithmetic and logic instructions 2. Rank these four categories of instructions from most important to least non-restoring division below.
important for optimizing the overall performance of the program. Assume that the clock frequency of the processor
which the program runs on is kept constant during the optimization process. (2 points) Remainder = 1001010
Divider = 1001000 Subtract [Iteration 1]
Solution: For each instruction, average cycle contribution of loads/stores = 0.15 × 10 = 1.5, average cycle con- --------------------------------------------
tribution of conditional branches = 0.25 × 4 = 1.0, average cycle contribution of other types of control transfer Remainder = 0000010
instructions = 0.1 × 3 = 0.3, average cycle contribution of ALU operations = 0.5 × 2 = 1.0. Therefore, the desired Divider = 0100100 Subtract [Iteration 2]
order of importance is Loads/Stores > Conditional branches = ALU operations > Other types of control --------------------------------------------
transfer instructions. Remainder = 1011110 [Negative]
Divider = 0010010 Add [Iteration 3]
Grading policy: 0.5 point if only the most important one or the least important one is correct. 1.0 point if both --------------------------------------------
most important and least important ones are correct. 2.0 points if the entire order is correct. Remainder = 1110000 [Negative]
Divider = 0001001 Add [Iteration 4]
3. Suppose the variables x, y, z are of signed type of length 32 bits and we would like to compute z = x − y. If --------------------------------------------
x is 0xffff0abc, what is the range of permissible values of y so that no overflow occurs in the subtraction used to Remainder = 1111001 [Negative]
compute z? Express the upper and lower bounds in hexadecimal of length 32 bits. (3 points) Divider = 0001001 Add [Extra iteration, cannot shift divider any more]
--------------------------------------------
Solution: The representation range using 32-bit two’s complement format is [−231 , 231 − 1]. Therefore, this is Remainder = 0000010 [Positive]
the allowed range for z i.e., −231 ≤ x − y ≤ 231 − 1. While one can compute the range for y given x from these
inequalities, we can simplify matters by observing that x is a negative number and hence, y can be any negative
number without causing an overflow because in a subtraction an overflow cannot occur when both operands are
of the same sign. Therefore, y ≥ −231 . To find how big y can be without causing an overflow, we use the
allowed range of x − y deduced above to get y ≤ x + 231 (this is an unsigned addition with the carry out ignored).
Therefore, 0x80000000 ≤ y ≤ 0x7fff0abc.

Grading policy: One point for correct lower bound and two points for correct upper bound.

1 2
4/28/2021 CS220 Quiz#5 4/28/2021 CS220 Quiz#5

Q1. Consider a program that has 15% load/store instructions, 15% conditional branch instructions, 10% other types of control Q3. Suppose Booth's algorithm is used in a multiplication where the multiplicand and the multiplier are represented in two's
transfer instructions, and 60% arithmetic and logic instructions. The program is executed on a processor with average complement and their respective values are 0xcddaabbc and 0xddaabbcc. Count the number of addition and subtraction
load/store CPI of 7, conditional branch CPI of 4, other types of control transfer instructions CPI of 3, and arithmetic and logic operations. [2 points]
instructions CPI of 2. Suppose the implementation of only one of the aforementioned four categories of instructions could be
Multiplier = 1101_1101_1010_1010_1011_1011_1100_1100
optimized to bring the CPI of that category down to 1 while keeping everything else unchanged. What is the maximum speedup
Number of 0 to 1 transitions = 10 = number of subtractions
achievable by this optimization? [2 points] Number of 1 to 0 transitions = 9 = number of additions

Let's assume that there are 100 instructions in the program.


Number of cycles spent in executing load/store = 105
Number of cycles spent in executing conditional branch = 60
Number of cycles spent in executing other control transfer instructions = 30 Q4. By inspecting the quotient of an unsigned division it is possible to infer the sequence of subtractions and additions that
Number of cycles spent in executing arithmetic and logic instructions = 120 would have taken place if the division was done using the non-restoring division algorithm. Calculate the number of addition
Maximum saving in cycles is obtained by optimizing the CPI of the load/store instructions. and subtraction operations if the quotient is 1110. [1 point]
Maximum speedup = (105+60+30+120)/(15+60+30+120) = 315/225 = 1.4
Since the division is unsigned, the first operation is guaranteed to be a subtraction. Next we have another subtraction. At this point, the quotient is
11. After this we have another subtraction making the quotient 111. Then we have another subtraction making the quotient 1111. Finally, an
addition makes the quotient 1110. So, there are one addition and four subtractions.

Q2. A certain portion P of a program has been optimized such that the execution time of that portion has become one-fourth
of the original time this portion used to take. The execution time of P after the optimization is one-third of the total post-
This content is neither created nor endorsed by Google.
optimization execution time of the program. What is the overall speedup enjoyed by the program due to this optimization? [1
point] Forms
Let us suppose that P originally used to take x fraction of the total original execution time t. That has become tx/4 after optimization. Since tx/4
is one-third of the total post-optimization time, total post-optimization time is 3tx/4. Since the execution time of everything other than P has
remained unchanged, we have 2tx/4 = (1-x)t or x=2/3. So, the speedup = t/(3tx/4) = 2.

https://docs.google.com/forms/d/1P3cBlEdSfIv5ajLjRDs0es3jBvFlIUyO6rFiRgbfW7A/edit#response=ACYDBNiAH_PqIjyHKoaD4ARNe2nN2w5CLgK1uwjWEFnwe3qJ02cki_2vQmYratjcx0uHbBs 2/3 https://docs.google.com/forms/d/1P3cBlEdSfIv5ajLjRDs0es3jBvFlIUyO6rFiRgbfW7A/edit#response=ACYDBNiAH_PqIjyHKoaD4ARNe2nN2w5CLgK1uwjWEFnwe3qJ02cki_2vQmYratjcx0uHbBs 3/3


5/3/2021 CS220 Quiz#6 5/3/2021 CS220 Quiz#6

Q1. Consider a pipelined processor with ten stages S1, S2, ..., S10 with individual stage latencies 1 ns, 2 ns, ..., 10 ns. Suppose the Q4. Consider a cache with 512 sets and 32-byte block size. The address length is 32 bits. Let the address be A[31:0]. The set
branch instructions complete execution in stage S8. The processor has a branch predictor integrated in the S1 stage. On a index is computed using the function A[31:23] XOR A[22:14] XOR A[13:5]. What should be the tag length? [1 point]
correct prediction, there is no loss in performance. A program running on this processor has 25% branch instructions. This
Since none of the original bits of the address can be recovered from the set index of a block, the tag needs to store all the address bits except the
program suffers from only control hazard and no other hazards. Assume that there is no branch delay slot. This program
block offset. This is needed to make sure that all blocks in a set can be distinguished unambiguously. So, the tag length should be 27 bits.
experiences a branch prediction accuracy of 75%. If the program executes a total of 100 instructions, compute its execution
time in nanoseconds (ns). [2 points]

This content is neither created nor endorsed by Google.


Cycle time of the pipeline = latency of the longest stage = 10 ns
Let us first assume that there is no hazard. So, the pipeline starts completing one instruction in every cycle after the first nine cycles. So,
execution time without any hazard = 109*10 ns = 1090 ns. Now due to control hazard only the mispredicted branch instructions are affected.
Forms
Each misprediction leads to loss of seven cycles. So, additional time lost in dealing with control hazard = 25*0.25*7*10 ns = 437.5 ns. So, the
total execution time = 1527.5 ns.

Q2. The designers of the processor in Q1 are trying to increase the clock frequency of the processor by subdividing any one of
the ten stages into two substages. What is the best achievable frequency for this processor by this method? Express your
answer in MHz. [1 point]

The longest stage needs to be subdivided. So, now we have an eleven-stage pipeline with stage latencies 1 ns, 2 ns, 3 ns, 4 ns, 5 ns, 6 ns, 7 ns, 8
ns, 9 ns, 5 ns, 5 ns. So, the best frequency achievable is 1000/9 MHz or 111.11 MHz.

Q3. Consider a cache of 512 KB capacity and 32-byte block size. At what associativity would the tag length get maximized?
What is the maximum tag length if the address is 36-bit long? Assume that a simple hash function that computes (A % number
of sets) is used to extract the set index, where A is the block address. [1+1 points]

The cache has 16384 blocks. The tag length is maximized when the cache is fully-associative. In this case, that corresponds to an associativity of
16384. The maximum tag length = length of address - length of block offset = 36 - 5 bits = 31 bits.

https://docs.google.com/forms/d/1E982x1rDUr9B4abTojcS5otn0D0S37QPIR1FoDD21yc/edit#response=ACYDBNjt1A6cDHXcfXRhWMNTimZudyAThq-Z8Fh5hqufQ08WB4FDm1BeWl1DVYarYsNj1Ak 2/3 https://docs.google.com/forms/d/1E982x1rDUr9B4abTojcS5otn0D0S37QPIR1FoDD21yc/edit#response=ACYDBNjt1A6cDHXcfXRhWMNTimZudyAThq-Z8Fh5hqufQ08WB4FDm1BeWl1DVYarYsNj1Ak 3/3

You might also like