You are on page 1of 32

UNIVERSITY OF BRISTOL

JANUARY 2022 Examination Period

FACULTY OF ENGINEERING

Examination for the Degree of


Bachelor and Master of Engineering, and Bachelor and Master of
Science

COMS10015
COMPUTER ARCHITECTURE

TIME ALLOWED:
3 hours

Answers to COMS10015: COMPUTER ARCHITECTURE


Part 1: weeks 1 to 5 (Dr. Daniel Page)
Q1. Identify which of the following Boolean expressions
A. x ∧ (x ∨ ¬x)
B. (x ∨ ¬x) ∧ x
C. x
D. ¬x
E. ¬((¬x ∨ ¬x) ∧ (¬x ∨ x))
is not equivalent to
x ∧ x ∨ x ∧ ¬x.

[1 mark]

Solution: Adding parentheses for clarity throughout, using either derivation

(x ∧ x) ∨ (x ∧ ¬x)
= x ∧ (x ∨ ¬x) (di str i bution)

(x ∧ x) ∨ (x ∧ ¬x)
= x ∧ (x ∨ ¬x) (di str i bution)
= (x ∨ ¬x) ∧ x (commutati v ity )

(x ∧ x) ∨ (x ∧ ¬x)
= x ∨ (x ∧ ¬x) (idempotency )
= x ∨0 (inv er se)
= x (identity )

(x ∧ x) ∨ (x ∧ ¬x)
= ¬(¬(x ∧ x) ∧ ¬(x ∧ ¬x)) (deMor gan)
= ¬((¬x ∨ ¬x) ∧ (¬x ∨ x)) (deMor gan)

or, failing that, enumeration

x x ∧ x ∨ x ∧ ¬x x ∧ (x ∨ ¬x) (x ∨ ¬x) ∧ x x ¬x ¬((¬x ∨ ¬x) ∧ (¬x ∨ x))


0 0 0 0 0 1 0
1 1 1 1 1 0 1

means we can conclude that

x ∧ x ∨ x ∧ ¬x 6≡ ¬x.

Page 2 of 32
Q2. Consider the following truth table, which describes a Boolean function f :

w x y z f (w , x, y , z)
0 0 0 0 1
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0

Which of the Karnaugh maps shown in Figure 1 will yield the most efficient (in terms of
the number of operators involved), correct Boolean expression for f ?
[1 mark]

Solution: At first glance, this question looks like a lot of work. However, we can
immediately rule out several options because the associated Karnaugh maps are clearly
invalid:

• option A is invalid because the dimensions do not match the truth table: it ignores
z s is for a 3-input rather than 4-input function,

• option B is invalid because the content does not match the truth table: the truth
table has 6 entries equal to 1 whereas the Karnaugh map has 5,

• option D is invalid because the 3-element red group is invalid: groups must be
rectangular, but this is L-shaped.

So only options C and E remain. Even just looking at them, we can guess that option
C will yield a more efficient expression because it uses fewer, larger groups (option E
uses unit-sized groups only). In more detail

Page 3 of 32 Turn Over/. . .


• Option C yields

r = f (w , x, y , z) = ( ¬x ∧ ¬z ) ∨
( w ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y )

and thus 5 AND, 2 OR, and 6 NOT operators.

• Option E yields

r = f (w , x, y , z) = ( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z )

and thus 18 AND, 5 OR, and 16 NOT operators.

Even considering the (significant) potential for applying common sub-expression, e.g.,
computing and sharing the result of ¬x once versus using using one operator for each
instance, option C will clearly involve fewer operators.

Q3. Imagine you are using an 1kB, byte-addressable SRAM device to provide a memory within
some larger system. In doing so, you make a mistake which means the 4-th address wire
A4 is not correctly connected: it therefore has the fixed value A4 = 0. Which of the
following options
A. 1
B. 4
C. 256
D. 512
E. 1024
reflects the number of addresses now accessible within the SRAM?
[1 mark]

Solution: A 1kB, byte-addressable SRAM would usually require n = 10 address wires.


The n-bit A means addresses between 0 and 2n − 1 = 210 − 1 = 1023 are accessible.

Page 4 of 32
Consider a small(er) example of an SRAM where n = 3: the addresses

A2 A1 A0 A
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
0 1 0 010(2) ≡ 2(10)
0 1 1 011(2) ≡ 3(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
1 1 0 110(2) ≡ 6(10)
1 1 1 111(2) ≡ 7(10)

are accessible. Now imagine the m-th address wire is misconnected where m = 1,
meaning A1 = 0: this yields

A2 A1 A0 A
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)

so now only addresses 0(10) , 1(10) , 4(10) , and 5(10) are accessible. Put another way, 1/2
of the originally accessible addresses will remain accessible. The same fact applies for
any m, so for n = 1024 we conclude that 1024/2 = 512 addresses are accessible.

Q4. Figure 2 describes the instruction set of an example 4-register counter machine. Consider
some i -th encoded, machine code instruction 0A5(16) expressed in hexadecimal. Which
of the following
A. halt computation
B. if register 2 equals 0 then goto instruction 5, else goto instruction i + 1
C. if register 10 equals 0 then goto instruction 5, else goto instruction i + 1
D. increment register 2, then goto instruction i + 1
E. decrement register 10, then goto instruction i + 1
best describes the instruction semantics?
[1 mark]

Page 5 of 32 Turn Over/. . .


Solution: First, note that

0A5(16) ≡ 000010100101(2) 7→ 010100101(2) .

We can see that the (red) opcode determines the instruction type, i.e.,

if Raddr = 0 then goto Ltar get else goto Li+1 .

More specifically, the (green) register address and the (blue) branch target address mean
the instruction semantics are

if R2 = 0 then goto L5 else goto Li+1 ,

i.e., if register 2 equals 0 then goto instruction 5, else goto instruction i + 1.

Q5. Consider the 3 instruction formats used by MIPS32, as shown in Figure 3. Imagine the
number of general-purpose registers is halved: which of the following statements
A. There must be a greater number of R-type instructions
B. There must be a greater number of I-type instructions
C. There could be 9 times the number of R-type instructions
D. R-type instructions could be 2 bits smaller
E. I-type instructions could have an immediate field which is 2 bits larger
is most likely to then be true?
[1 mark]

Solution: If the number of general-purpose registers is halved, then the number of bits
required for each register address is decreased by one: before there are 32 general-
purpose registers requiring a 5-bit address, whereas afterwards there are 16 general-
purpose registers requiring a 4-bit address. This change leads to there being 3 and 2
unused bits in the R- and I-type formats respectively.
Some of the statements cannot be true, and some could be but are not likely to be
true. We can consider the statements one-by-one:

• There could be a greater number of R- or I-type instructions. This is unlikely,


however, because it implies the opcode field will become unaligned between fields.
Either way, the “must” term is too strong: there is no requirement for this to be
the case, even if it were possible.

• There could be a greater number of R-type instructions, but the 3 unused bits
would allow at most 8 times as many.

Page 6 of 32
• MIPS32 uses a fixed-length encoding: based on this, no format can be larger or
smaller than another, meaning it would not be viable for R-type instructions to be
smaller (irrespective of how much).

As such, the final statement is correct: there are 2 unused bits in the I-type instruction
format, so the imm field could indeed by 2 bits larger (so 17 bits afterwards, vs. 15
bits before).

Q6. Consider two unsigned, 8-bit integer variables, x and y, as declared in some C function by
using the type uint8˙t. For how many assignments to these variables will the Hamming
weight of their unsigned, 8-bit integer sum, i.e., x + y, be zero? Put another way, how
many elements does the set

{(x, y) | H(x + y) = 0}

have?
A. 0
B. 1
C. 255
D. 256
E. 65536
[2 marks]

Solution: We know that


0 ≤ x, y < 28 = 256
due to their type and representation, so there are 256·256 = 65536 possible assignments
to them variables, i.e., pairs
(x, y)
to consider. Only the case where their sum x + y is zero will yield a Hamming weight
of zero: the repersentation of a non-zero sum will have at least one bit in it equal to
one, and hence a Hamming weight of greater than zero.
The obvious initial answer would be that the assignment x = 0 and y = 0 is the
only case where x + y = 0 and hence H(x + y) = 0. However, others exist due to
the effect of overflow: x = 255 and y = 1 should yield x + y = 256, for example,
but, due to overflow (i.e., the fact we we cannot represent 256 as an unsigned, 8-bit
integer), actually yields x + y = 0 and hence H(x + y) = 0. Applying this principle
more generally, the correct answer is that 256 pairs will yield x + y = 0 and hence
H(x + y) = 0: put simply, every possible x has exactly one y that will yield the sum
x + y = 0.

Page 7 of 32 Turn Over/. . .


Q7. Consider the fact that

1 x
r r
0 y c
x r ≡
x

i.e., that one can implement a NOT gate using one instance of a 2-input, 1-bit multiplexer
component. Assuming you want to minimise the number of multiplexer instances, identify
how many are required to implement the expression
(x ∧ y ) ∨ z.
A. 1
B. 2
C. 3
D. 6
E. 8
[2 marks]

Solution: First, recall that the following truth table


c x y r
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 1
specifies the behaviour of a 2-input, 1-bit multiplexer: in short, we find that

x if c = 0
r= .
y if c = 1
As such, we can implement an AND gate as follows:

x
y y c
r r

Page 8 of 32
We can show why the implementation is valid (i.e., produces a result matching AND)
by inspection:
x y r
0 0 x =0
0 1 x =0
1 0 y =0
1 1 y =1
Notice that x = 0 implies the multiplexer selects the top input and hence r = x, whereas
x = 1 implies the multiplexer selects the top input and hence r = y ; overall, r clearly
matches AND in the sense r = 1 if x = 1 and y = 1. Using the same approach, we can
implement OR as follows

y x
y c
r r

and justify validity again by inspection:

x y r
0 0 y =0
0 1 y =1
1 0 x =1
1 1 x =1

Overall then, the expression


(x ∧ y ) ∨ z
can be implemented using just two multiplexers: one to implement the AND operator,
and one to implement the OR operator.

Q8. Consider the combinatorial logic design as shown in Figure 4, which is described using
N-type and P-type MOSFET transistors. Within the design, three inputs (i.e., x, y , and
z) and one output (i.e., r ) can be identified; note that several transistors (e.g., m0 )
and intermediate signals (e.g., t0 ) are annotated for reference. Which of the following
Boolean expressions
A. ¬x
B. ¬((x ∨ y ) ∧ z)
C. (¬(x ∨ y )) ∧ z
D. ¬(x ∧ y ∧ ¬z)

Page 9 of 32 Turn Over/Qu. continues . . .


(cont.)

E. ¬(x ∨ y ∨ ¬z)
does the design implement?
[2 marks]

Solution: This question can be approached in several ways. First, one could employ
basic pattern matching: read from left-to-write, the three dominant structures can be
matched against known NAND, NOR, and NOT gate implementations. As such, the
design implements the expression

r = ¬((x ∧ y ) ∨ z)

which we manipulate as follows

¬((x ∧ y ) ∨ z)
= ¬((¬(x ∧ y )) ∨ z) (NAND)
= ¬(¬((¬(x ∧ y )) ∨ z)) (NOR)
= ¬(x ∧ y ∧ ¬z) (deMor gan)

such that
r = ¬(x ∧ y ∧ ¬z).
Second, although it involves more work, one can enumerate the transistor and signal
states for each input combination. For example, using + (resp. −) to denote where a
given transistor is connected or activated (resp. disconnected or deactivated), we can
write
x y z m0 m1 m2 m3 t0 m4 m5 m6 m7 t1 m8 m9 r
0 0 0 + + − − 1 − − + + 0 + − 1
0 0 1 + + − − 1 − + + − 0 + − 1
0 1 0 − + + − 1 − − + + 0 + − 1
0 1 1 − + + − 1 − + + − 0 + − 1
1 0 0 + − − + 1 − − + + 0 + − 1
1 0 1 + − − + 1 − + + − 0 + − 1
1 1 0 − − + + 0 + − − + 1 − + 0
1 1 1 − − + + 0 + + − − 0 + − 1

and then derive the expression

r = ¬(x ∧ y ∧ ¬z)

directly.

Q9. Imagine that an ISA includes an instruction whose semantics are


GPR[x] ← MEM[MEM[GPR[y ]] + GPR[z]].

Page 10 of 32 Qu. continues . . .


(cont.)

Based on this instruction alone, which of the following


A. CISC
B. RISC
C. Neither CISC nor RISC
D. Both CISC and RISC
best classifies the ISA?
[2 marks]

Solution: Classifying any ISA in absolute terms can be difficult, and open to opinion and
debate. However, this particular case is fairly clear. In short, this is a load instruction
which uses a fairly complex addressing mode. Rather than including instructions with
simple semantics which can be used as a “building‘block” to realise more complex
functionality, that functionality is combined into this single instruction. For example the
same functionality could realised via

GPR[t] ← MEM[GPR[y ]]
GPR[t] ← GPR[t] + GPR[z]
GPR[x] ← MEM[GPR[t]]

instead, and hence a simpler indirect addressing mode. However, that same complexity
allows a higher code density because fewer instructions (i.e., 1 vs. 3) are required.
Finally, note that the instruction performs two memory accesses. Since the value loaded
by the first is used as an address in the second, these accesses must be done in sequence;
this fact would suggest the instruction has a multi-cycle execution latency. Overall this
suggests that the ISA, or this instruction at least, is best classified as CISC.

Q10. Consider the sequential logic design as shown in Figure 5, which contains two D-type
flip-flops. Within the design, one output (i.e., r ) can be identified; note that several
intermediate signals (e.g., t0 ) are annotated for reference. If the clock signal clk has a
frequency of 400MHz, what is the frequency of r ?
A. 100MHz
B. 200MHz
C. 400MHz
D. 800MHz
E. 1600MHz
[3 marks]

Page 11 of 32 Turn Over/. . .


Solution: To start with, keep in mind that this design uses flip-flops: these are edge-
triggered (versus latches, which are level-triggered). By focusing on and inspecting the
left-hand flip-flop, we infer that the state will be updated to reflect D = 1 ⊕ Q on each
positive edge of clk. Given the truth table

x y r
0 0 0
0 1 1
1 0 1
1 1 0

we find that D = 1 ⊕ Q ≡ ¬Q, suggesting, therefore, that this is a toggle flip-flop


constructed by using a D-type flip-flop: on each positive edge of clk, the state will
toggle either from 0 to 1 or from 1 to 0. Note that the right-hand flip-flop has a similar
construction, but that the lower input of the XOR comes from the left-hand flip-flop.
Imagine that both flip-flops are reset, so their initial state is 0. We can draw a waveform
which describes each signal:

clk
t0
t1
t2
t3
r

Put simply, this suggests that each toggle flip-flop acts to halve the frequency: t1
toggles at half the frequency of clk and t3 toggles at quarter the frequency of clk.
Given that clk has a frequency of 400MHz, we therefore expect r = t3 to toggle with
a frequency of 400
4 = 100MHz.

Q11. Consider the design as shown in Figure 6, which implements a simple Finite State
Machine (FSM) using D-type latches and a 2-phase clock. Note that the r output
reflects whether the FSM is in an accepting state, the r st input resets the FSM into the
start state, and the Xi input drives transitions between states: the idea is that the i -th
element of a sequence
X = hX0 , X1 , . . . , Xn−1 i
is provided as input, via Xi , in the i -th step. Assuming the entirety of X is consumed,
which of the following
A. r = X0 ⊕ X1 ⊕ · · · ⊕ Xn−1
B. r = X0 ∧ X1 ∧ · · · ∧ Xn−1

Page 12 of 32 Qu. continues . . .


(cont.)

C. r = X0 ∧ X1 ∧ · · · ∧ Xn−1
D. r = X0 ∨ X1 ∨ · · · ∨ Xn−1
E. r = X0 ∨ X1 ∨ · · · ∨ Xn−1
best describes the output from, or functionality of the FSM?
[3 marks]

Solution: Basically this question is asking us to reverse engineer the FSM implementa-
tion into a design and hence functionality; to do that, we can step backwards through
the process we have that would normally step forwards.
The first step is therefore be to inspect the implementation and extract pertinent fea-
tures: 1) the bottom and top D-type latches capture 1-bit current and next states,
i.e., Q and Q0 , respectivly, 2) between the two we can identify an output function
r = ω(Q) = ¬Q and a transision function Q0 = δ(Q, r st) = (¬r st) ∧ (¬Xi ∨ Q). Note
that we can classify this as a Moore-type FSM, since the output r is determined by the
current state Q alone.
The next step is to reconstruct a concrete, tabular description of the FSM, i.e., a truth
table, using ω and δ:
δ ω
r st Xi Q Q0 r
0 0 0 1 1
0 0 1 1 0
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 0 0
1 1 0 0 1
1 1 1 0 0
Because Q and Q0 are each represented by a single D-type latch, we can infer the FSM
has (at most) two states. Other assignments are possible provided we are consistent,
but the most natural would be to say Q = 0 7→ S0 and Q = 1 7→ S1 . Given that r st = 1
forces Q0 = 0, we can infer than S0 is an initial state; Given that r = ¬Q and so r = 1
iff. Q = 0, we can infer than S0 is an accepting state.
The next step is to reconstruct a abstract, diagrammatic description of the FSM:

Page 13 of 32 Turn Over/. . .


Xi = 0
Xi = 0

start S0 S1

Xi = 1 Xi = 1

The final step demands some creativity, in the sense that we need to interpret the
functionality realised: although doing so is not trivial, we can approach it by tring to
explain in words what the FSM does step-by-step. For example, note that the FSM
starts in state S0 and stays there while the input is Xi = 1. However, as soon as it
encounters an input st. Xi = 0 it will transistion to state S1 : it stays there whether the
input is Xi = 0 or Xi = 1. So, put another way, the FSM

• stays in state S0 if Xi = 1 for all i ; if therefore accepts such an input, meaning


r = 1,

• transistions to and stays in state S1 if Xi = 0 for any i ; if therefore rejects such


an input, meaning r = 0.

This description matches the definition of AND: we have that

r = X0 ∧ X1 ∧ · · · ∧ Xn−1 .

Q12. Figure 7 outlines, at a high level, a 4-register counter machine implementation; Figure 8
completes said implementation, detailing internals of the decoder component. Note that
the multiplexer inputs should be read left-to-right, and use zero-based indexing. Using
the left-most multiplexer in the decoder as an example, if the 3-bit control-signal derived
from i nst is 001(2) = 1(10) then the 1-st input is selected; this means the output is 2(10) .
Which of the following
A. Li : if R3 = 0 then goto L9 else goto Li+1
B. Li : if R3 + 1 = 0 then goto L9 else goto Li+1
C. Li : R3 ← R3 + 1 then goto Li+1
D. Li : R3 ← 0 then goto Li+1
E. None of the above
describes the semantics of a machine code instruction 100111001(2) for this counter
machine? [3 marks]

Page 14 of 32 Qu. continues . . .


(cont.)

Solution: Once fetched, the instruction inst = 100111001(2) is provided as input to


the decoder: based on the implementation given, the decoder will therefore produce

• op = 0(10) because inst8,···6 = 100(2) = 4(10) ,

• w r = 1(10) because inst8,···6 = 100(2) = 4(10) ,

• addr = 3(10) because inst5,···4 = 11(2) = 3(10) ,

• tar get = 9(10) because inst3,···0 = 1001(2) = 9(10) ,

• jmp = 0(10) because ¬inst8 ∧ inst7 ∧ ¬inst6 = 0 ∧ 0 ∧ 0 = 0,

• halt = 0(10) because ¬inst8 ∧ inst7 ∧ inst6 = 0 ∧ 0 ∧ 0 = 0.

as output. Looking then at the data- and control-path to assess how these outputs are
used to execute the instruction, we conclude that

• since addr = 3(10) , the register R3 is read from,

• since op = 0(10) , this value is discarded and a 0 output by the multiplexer is


written into register R0 ,

• since w r = 1(10) register R0 is written into register R3 ,

• since halt = 0(10) the counter machine does not halt,

• since jmp = 0(10) , the program counter is incremented as normal (i.e., not set to
tar get, which in fact is unused).

In general then, this instruction will writes 0 into register Raddr ; given addr = 3 here,

Li : R3 ← 0 then goto Li+1

is therefore the correct semantics.

Page 15 of 32 Turn Over/. . .


Part 2: weeks 7 to 11 (Dr. Tom Deakin)
Q13. Consider the following instructions which are inspired by the x86 ISA, and operate on
8-bit registers containing unsigned integers.

ADD r0, r1 r 0 ← r 0 + r 1; set the carry flag (CF) if the result overflowed.
RCR r0 right-shift r 0 through the carry flag (CF): set the new CF to the least-significant
(0-th) bit of the old r 0, right-shift the old r 0 by 1 bit to produce a new r 0, set the
most-significant (7-th) bit of the new r 0 to the old CF.

If we set CF = 0, r0 = 5, and r1 = 9, then execute the following program

add r0, r1
rcr r0

what values do the registers r0 and r1 have afterwards?


A. r0 = 28, r1 = 9
B. r0 = 9, r1 = 5
C. r0 = 7, r1 = 9
D. r0 = 6, r1 = 9
E. r0 = 14, r1 = 4
[2 marks]

Solution: The instruction sequence computes the average of r0 and r1: r 0 = (r 0 +


r 1)/2.
The ADD instruction sets r 0 to the sum of r 0 and r 1: r 0 ← 5 + 9 = 14.
The RCR instruction performance a right shift by 1 digit (equiv. divide by two). For the 9-
digit number [CF, r0[7], r0[6], r0[5], r0[4], r0[3], r0[2], r0[1], r0[0]]
constructed from the 8 bits of r0 and the carry flag (CF), the instruction produces:
CF ← r 0[0], r 0 = [CF, r 0[7], r 0[6], r 0[5], r 0[4], r 0[3], r 0[2], r 0[1]]
Therefore, the RCR instruction sets r 0 ← 7. r1 remains unchanged.

Q14. A stack data structure is often used to support subroutine calling. For a subroutine
written using a high level language (such as C), who or what is responsible for allocating
space on the stack for any local variables used?
A. The programmer.
B. The compiler.

Page 16 of 32 Qu. continues . . .


(cont.)

C. The loader.
D. The linker.
E. The special registers.
[1 mark]

Solution: Tests understanding of the subroutine calling process.


The compiler will insert instructions to decrease the stack pointer sufficiently for sub-
routine’s local variables.
The programmer might have to do it by hand when writing assembly, but this question
specifies a high level language.

Q15. A program has the following mix of instructions:

Instruction Cycles Frequency


Load 4 20%
Store 4 15%
Branch 8 5%
Arithmetic 1 60%

What is the average number of clock cycles per instruction (CPI) for this program?
A. 2.40
B. 14.12
C. 2.08
D. 43.53
E. 0.17
[2 marks]

Solution: The CPI is computed as the weighted average of the instruction cycles where
the weights are the frequencies.

4 × 20 + 4 × 15 + 8 × 5 + 1 × 60
CP I = = 2.4
100

Q16. In a given computer system, accesses to main memory by the processor are supported
by a 16 KiB (16 384 = 214 bytes) direct mapped cache. Memory addresses are 16 bits,
and each addressable element has a word size of 4 bytes. Cache blocks are of size
64 bytes. Locations in the cache are numbered starting at 0.

Page 17 of 32 Turn Over/Qu. continues . . .


(cont.)

Consider the memory address 0101010110101010 (decimal 21 930) where the least-
significant bit is on the right-hand side.
Which location in cache is data stored at this address placed, and what is the value in
the Tag store?
A. Location 01011010 (decimal 90), Tag store 0101.
B. Location 10101010 (decimal 170), Tag store 01010101.
C. Location 1010 (decimal 10), Tag store 01010101.
D. Location 01011010 (decimal 90), Tag store 1010.
E. Location 10101010 (decimal 170), Tag store 0101.
[3 marks]

Solution: Cache blocks are of size 26 bytes, but the memory is addressable in words of
4 bytes. Therefore, there are 26 /22 = 24 words per cache line; hence the lowest 4 bits
of the memory address are used to index within a cache line. Note, the lowest 4 bits of
the memory address are safe to ignore in this question.
There are 214 /64 = 214 /26 = 28 locations in cache. Therefore, 8 bits of the memory
address are used to determine the location. They are the 5th–12th least-significant
bits, inclusive.
Finally, the remaining 4 most-significant bits are stored in the Tag.

Q17. Compute the two’s complement addition of the following two 8-bit numbers:

• 01100100
• 10111000

The result is:


A. 10101100 (-84 in decimal).
B. 11011100 (-36 in decimal).
C. 00011100 (28 in decimal).
D. 11100100 (-28 in decimal).
E. 11111100 (-4 in decimal).
[2 marks]

Solution: This question asks to sum 100 and -72.

Page 18 of 32
01100100
10111000
--------
00011100 result
11100000 carry out

Q18. Consider the following extract from a C program:

int i, j;
int A[80];
for (i = 0; i ¡ 8; ++i) –
for (j = 0; j ¡ 10; ++j) –
int k = i * 10 + j;
A[k] = j * 10 + i;
˝
˝

Apply the strength reduction optimisation to this program.


How many additions and multiplications now occur in total, excluding the increments to
i and j in the for loops?
A. 116 additions and zero multiplications.
B. 28 additions and zero multiplications.
C. 168 additions and zero multiplications.
D. 28 additions and 2 multiplications.
E. 256 additions and zero multiplications.
[3 marks]

Solution: Application of a procedure from the lectures. It has a number of steps,


including counting at the end to answer the question directly.
There are two loops, so we apply the optimisation twice. For the inner j loop, the result
of the optimisation is:

int i, j;
int A[80];
for (i = 0; i ¡ 8; ++i) –
int l = i * 10 + 0;
int m = 0 * 10 + i;

Page 19 of 32 Turn Over/. . .


for (j = 0; j ¡ 10; ++j) –
A[l] = m;
l = l + 1;
m = m + 10;
˝
˝

Now, applying it to the i loop. Note that in the above m == i, so we don’t need to do
anything to it, and the optimisation is only applied to l.

int i, j;
int A[80];
int p = 0;
for (i = 0; i ¡ 8; ++i) –
int l = p;
int m = i;
for (j = 0; j ¡ 10; ++j) –
A[l] = m;
l = l + 1;
m = m + 10;
˝
p = p + 10;
˝

There are therefore 10 × 2 additions in the j loop. Each iteration of the i loop performs
the 20 additions for the j loop and 1 more, for a total of 8 × 21 = 168 additions, and
zero multiplications.

Q19. Which of the following might an ISA include to support system calls:
A. Timing interrupts.
B. Hardware interrupts.
C. Kernel scheduler.
D. Software interrupts.
E. Bus.
[1 mark]

Page 20 of 32
Solution: A test of comprehension of the OS lecture, which covered all these topics.
The correct answer is software interrupts, as the processor needs a mechanism to pause
execution to refer to the OS on demand.

Question Q20, Question Q21 and Question Q22 all refer to the following computer system:
The system supports up to 232 bytes (4 GiB) of byte-addressable virtual memory. There are
229 bytes (512 MiB) of physical memory installed. The memory is split in pages of size 1 KiB
(210 bytes).

Q20. How many entries are there in the page table?


A. 222 (4 194 304).
B. 22.
C. 219 (524 288).
D. 19.
E. 8.
[1 mark]

Solution: There are 232−10 = 222 pages, and so the Page Table needs this many entries.

Q21. How large are the entries in the page table?


A. 32 bits.
B. 29 bits.
C. 10 bits.
D. 19 bits.
E. 22 bits.
[1 mark]

Solution: We can store 229−10 = 219 pages in physical memory, and so each entry of
the page table stores 19 bits to identify the physical addresses.

Q22. The ISA for this computer system uses 32-bit memory addresses. Which bits are used
unmodified in the translated physical address?
A. The 19 least-significant bits.
B. The 10 least-significant bits.
C. The 10 most-significant bits.

Page 21 of 32 Turn Over/Qu. continues . . .


(cont.)

D. The 22 least-significant bits.


E. The 22 most-significant bits.
[1 mark]

Solution: A memory address is 32 bits long. The 32 − 10 = 22 most-significant bits


are used as the index to the page table, the remaining 10 least-significant are used to
identify bytes within a page. The page table supplies the 19 bits to combine with the
10 least-significant bits of the virtual address to provide a physical address.

Q23. During the assembly of an assembly program into machine code, labels are resolved.
Consider the following Hex 8 assembly program:

Line 1 - BR L1
Line 2 - L2
Line 3 - DATA 1
Line 4 - L3
Line 5 - DATA 1
Line 6 - L1
Line 7 - LDBM L2
Line 8 - LDAM L3
Line 9 - STAM L2
Line 10 - ADD
Line 11 - STAM L3
Line 12 - LDBC 8
Line 13 - SUB
Line 14 - BRN L1

During label resolution, the labels on lines 7, 8, 9, 11, and 14 are resolved. What constant
values are they replaced with, respectively? Assume no extra instructions are created.
Recall the Hex 8 execution cycle: Fetch, Increment program counter (P C), Execute.
You may assume that the P C is set to the line number when that line is fetched.
A. 2, 4, 2, 4, -8
B. 3, 5, 3, 5, 8
C. -4, -3, -6, -6, -8
D. 2, 4, 2, 4, 8
E. 3, 5, 3, 5, -8
[2 marks]

Page 22 of 32
Solution: This Hex 8 program generates the Fibonacci numbers in memory up to 8.
L1 refers to the instruction on Line 7. L2 refers to the instruction on Line 3. L3 refers
to the instruction on Line 5.
The LDAM, LDBM, and STAM instructions all take immediate values for the label. Hence,
L2 = 3 on lines 7 and 9, and L3 = 5 on lines 8 and 11.
The BRN instruction updates the P C, and so the relative offset must be used. The P C
is incremented after fetch, and before execute, so the P C is one greater than the line
number. Hence, L1 resolves to 7 − 15 = −8 on Line 14.

Q24. A bus is used to communicate data between the CPU and memory, which operate on
different clocks frequencies. A number of signal/control wires are used to implement a
communication protocol between the CPU and memory.
The CPU requests data from memory by sending the appropriate signals for a read. How
does CPU know that the data has arrived and it is safe to copy from the bus?
A. The data has arrived after the memory signals is it ready.
B. The data has arrived after a fixed number of CPU cycles.
C. The data has arrived after a fixed number of memory cycles.
D. The data has arrived after the CPU signals it is ready.
E. The data arrives after a fixed number of both CPU and memory cycles.
[1 mark]

Solution: This question tests the different situations where synchronous and asyn-
chronous buses are used, and how they work at a high level. The key detail is that
the CPU and memory have different clocks, and so an asynchronous bus must be used,
which operates on control signals rather than against the clock. Therefore, the data
arrives once the memory tells the CPU it’s on the bus.

Q25. Many ISAs, including x86 and Arm, provide extensions to support SIMD vector instruc-
tions. These instructions perform the same operation on each element of the vector.
Two input programs are written, both performing the same total number of computations
(arithmetic operations). One uses SIMD vector instructions and the other does not.
Which of these statements is true?
A. The program which uses vector instructions results in fewer instructions to be
fetched and decoded.
B. The program which uses vector instructions results in more instructions to be
fetched and decoded.

Page 23 of 32 Turn Over/Qu. continues . . .


(cont.)

C. The program which uses vector instructions results in more computations to


be performed by the processor.
D. The program which uses vector instructions results in fewer computations to
be performed by the processor.
E. The program which uses vector instructions results will always take longer to
execute.
[1 mark]

Solution: This question tests an application of the effect of SIMD instructions.


The vector program will contain fewer instructions, and so fewer instructions will be
fetched/decoded for a fixed amount of arithmetic.
The question contains no detail about the speed of execution or the implementation of
the SIMD instructions in the micro-architecture. Therefore the answer regarding speed
of execution is not always true.

Q26. A ripple-carry full adder can be turned into an adder-subtractor by converting one of the
inputs to two’s complement. The control signal is sent to the first carry-in bit of the first
1-bit full-adder. Which logic gate can be used to combine each bit of one of the inputs
with the control signal to complete the conversion in order to compute subtraction?
A. AND gate.
B. OR gate.
C. XOR gate.
D. NOT gate.
E. NAND gate.
[1 mark]

Solution: This question tests understanding of the implementation of two’s comple-


ment conversion. One input needs to inverted with a logical NOT operation only if the
control signal is active.
If a NOT gate is used, a multiplexor needs to select between the two versions; the
question asks for the input to be combined with the control signal, and so a NOT gate
alone doesn’t combine the signals.
AND, OR and NAND gates have the incorrect logic tables for the required inversion of
the input bit based on the control bit.
The XOR gate provides the correct inversion.

Page 24 of 32
Additional figures and tables

w
x
00 01 11 10
y
00 1 0 1 1
x 0 1 5 4

00 01 11 10
01
2
0 3
0 7
0 6
0
z
0
0
1 1
0 5
1 4
1 11
10
0 11
0 15
0 14
0
y
w 1
2
0 3
0 7
0 6
1 10
8
1 9
0 13
0 12
1

A. B.

w w
x x
00 01 11 10 00 01 11 10
00
0
1 1
0 5
1 4
1 00
0
1 1
0 5
1 4
1
01
2
0 3
0 7
0 6
1 01
2
0 3
0 7
0 6
1
z z
11
10
0 11
0 15
0 14
0 11
10
0 11
0 15
0 14
0
y y
10
8
1 9
0 13
0 12
1 10
8
1 9
0 13
0 12
1

C. D.

w
x
00 01 11 10
00
0
1 1
0 5
1 4
1
01
2
0 3
0 7
0 6
1
z
11
10
0 11
0 15
0 14
0
y
10
8
1 9
0 13
0 12
1

E.

Figure 1: A set of 5 different Karnaugh maps, captioned with an associated option.

Page 25 of 32 Turn Over/. . .


8 7 6 5 4 3 2 1 0

Li : Raddr ← Raddr + 1 then goto Li+1 7→ 000 addr 0000


8 7 6 5 4 3 2 1 0

Li : Raddr ← Raddr − 1 then goto Li+1 7→ 001 addr 0000


8 7 6 5 4 3 2 1 0

Li : if Raddr = 0 then goto Ltarget else goto Li+1 7→ 010 addr target
8 7 6 5 4 3 2 1 0

Li : halt 7→ 011 00 0000

Figure 2: The instruction set for an example 4-register counter machine.

Page 26 of 32
31

26

25

21

20

15

11

10

0

opcode rs rt rd shamt funct R-type

opcode rs rt imm I-type

opcode imm J-type

Figure 3: A diagramatic description of the 3 instruction formats used by MIPS32.

Page 27 of 32 Turn Over/. . .


x y z

Vdd

m0 m4

m1 m5 m8
t1
r
t0
m2 m6 m9

m3 m7

Vss

Figure 4: A combinatorial logic design, described using N-type and P-type MOSFET transis-
tors.

Page 28 of 32
t0 t1 t2 t3

1 D Q D Q r
en en
¬Q ¬Q

clk
Figure 5: A sequential logic design, containing two D-type flip-flops.

Page 29 of 32 Turn Over/. . .


D Q
φ2 en
¬Q

rst

D Q
φ1 en
¬Q

r Xi

Figure 6: Implementation of a simple FSM, using D-type latches and a 2-phase clock.

Page 30 of 32
Q Q
R0 PC0
rst D en rst D en
rst rst
¬halt ∧ Φ2 ¬halt ∧ Φ1

op jmp ∧ cmp

target
op
wr
? addr
0 +1 −1 =0 cmp +1 decoder target
jmp
halt

inst

addr

Page 31 of 32
Q Q Q Q Q addr
R0 R1 Rr−1 PC MEM IR
rst D en rst D en rst D en rst D en rst D en data

rst rst

¬halt ∧ Φ1 ¬halt ∧ Φ2

addr

wr

Turn Over/. . .
Figure 7: The high-level data- and control-path for an example 4-register counter machine.
halt
jmp

merge
target

inst3,...,0

merge

inst5,...,4
addr

merge

inst8,...,6
11001000
wr

merge

inst8,...,6
12000000
op

inst8
inst7
inst6
inst5
inst4
inst3
inst2
inst1
inst0

Figure 8: The low-level decoder implementation for an example 4-register counter machine.

Page 32 of 32
END OF PAPER

You might also like