Professional Documents
Culture Documents
FACULTY OF ENGINEERING
COMS10015
COMPUTER ARCHITECTURE
TIME ALLOWED:
3 hours
[1 mark]
(x ∧ x) ∨ (x ∧ ¬x)
= x ∧ (x ∨ ¬x) (di str i bution)
(x ∧ x) ∨ (x ∧ ¬x)
= x ∧ (x ∨ ¬x) (di str i bution)
= (x ∨ ¬x) ∧ x (commutati v ity )
(x ∧ x) ∨ (x ∧ ¬x)
= x ∨ (x ∧ ¬x) (idempotency )
= x ∨0 (inv er se)
= x (identity )
(x ∧ x) ∨ (x ∧ ¬x)
= ¬(¬(x ∧ x) ∧ ¬(x ∧ ¬x)) (deMor gan)
= ¬((¬x ∨ ¬x) ∧ (¬x ∨ x)) (deMor gan)
x ∧ x ∨ x ∧ ¬x 6≡ ¬x.
Page 2 of 32
Q2. Consider the following truth table, which describes a Boolean function f :
w x y z f (w , x, y , z)
0 0 0 0 1
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
Which of the Karnaugh maps shown in Figure 1 will yield the most efficient (in terms of
the number of operators involved), correct Boolean expression for f ?
[1 mark]
Solution: At first glance, this question looks like a lot of work. However, we can
immediately rule out several options because the associated Karnaugh maps are clearly
invalid:
• option A is invalid because the dimensions do not match the truth table: it ignores
z s is for a 3-input rather than 4-input function,
• option B is invalid because the content does not match the truth table: the truth
table has 6 entries equal to 1 whereas the Karnaugh map has 5,
• option D is invalid because the 3-element red group is invalid: groups must be
rectangular, but this is L-shaped.
So only options C and E remain. Even just looking at them, we can guess that option
C will yield a more efficient expression because it uses fewer, larger groups (option E
uses unit-sized groups only). In more detail
r = f (w , x, y , z) = ( ¬x ∧ ¬z ) ∨
( w ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y )
• Option E yields
r = f (w , x, y , z) = ( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z )
Even considering the (significant) potential for applying common sub-expression, e.g.,
computing and sharing the result of ¬x once versus using using one operator for each
instance, option C will clearly involve fewer operators.
Q3. Imagine you are using an 1kB, byte-addressable SRAM device to provide a memory within
some larger system. In doing so, you make a mistake which means the 4-th address wire
A4 is not correctly connected: it therefore has the fixed value A4 = 0. Which of the
following options
A. 1
B. 4
C. 256
D. 512
E. 1024
reflects the number of addresses now accessible within the SRAM?
[1 mark]
Page 4 of 32
Consider a small(er) example of an SRAM where n = 3: the addresses
A2 A1 A0 A
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
0 1 0 010(2) ≡ 2(10)
0 1 1 011(2) ≡ 3(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
1 1 0 110(2) ≡ 6(10)
1 1 1 111(2) ≡ 7(10)
are accessible. Now imagine the m-th address wire is misconnected where m = 1,
meaning A1 = 0: this yields
A2 A1 A0 A
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
so now only addresses 0(10) , 1(10) , 4(10) , and 5(10) are accessible. Put another way, 1/2
of the originally accessible addresses will remain accessible. The same fact applies for
any m, so for n = 1024 we conclude that 1024/2 = 512 addresses are accessible.
Q4. Figure 2 describes the instruction set of an example 4-register counter machine. Consider
some i -th encoded, machine code instruction 0A5(16) expressed in hexadecimal. Which
of the following
A. halt computation
B. if register 2 equals 0 then goto instruction 5, else goto instruction i + 1
C. if register 10 equals 0 then goto instruction 5, else goto instruction i + 1
D. increment register 2, then goto instruction i + 1
E. decrement register 10, then goto instruction i + 1
best describes the instruction semantics?
[1 mark]
We can see that the (red) opcode determines the instruction type, i.e.,
More specifically, the (green) register address and the (blue) branch target address mean
the instruction semantics are
Q5. Consider the 3 instruction formats used by MIPS32, as shown in Figure 3. Imagine the
number of general-purpose registers is halved: which of the following statements
A. There must be a greater number of R-type instructions
B. There must be a greater number of I-type instructions
C. There could be 9 times the number of R-type instructions
D. R-type instructions could be 2 bits smaller
E. I-type instructions could have an immediate field which is 2 bits larger
is most likely to then be true?
[1 mark]
Solution: If the number of general-purpose registers is halved, then the number of bits
required for each register address is decreased by one: before there are 32 general-
purpose registers requiring a 5-bit address, whereas afterwards there are 16 general-
purpose registers requiring a 4-bit address. This change leads to there being 3 and 2
unused bits in the R- and I-type formats respectively.
Some of the statements cannot be true, and some could be but are not likely to be
true. We can consider the statements one-by-one:
• There could be a greater number of R-type instructions, but the 3 unused bits
would allow at most 8 times as many.
Page 6 of 32
• MIPS32 uses a fixed-length encoding: based on this, no format can be larger or
smaller than another, meaning it would not be viable for R-type instructions to be
smaller (irrespective of how much).
As such, the final statement is correct: there are 2 unused bits in the I-type instruction
format, so the imm field could indeed by 2 bits larger (so 17 bits afterwards, vs. 15
bits before).
Q6. Consider two unsigned, 8-bit integer variables, x and y, as declared in some C function by
using the type uint8˙t. For how many assignments to these variables will the Hamming
weight of their unsigned, 8-bit integer sum, i.e., x + y, be zero? Put another way, how
many elements does the set
{(x, y) | H(x + y) = 0}
have?
A. 0
B. 1
C. 255
D. 256
E. 65536
[2 marks]
1 x
r r
0 y c
x r ≡
x
i.e., that one can implement a NOT gate using one instance of a 2-input, 1-bit multiplexer
component. Assuming you want to minimise the number of multiplexer instances, identify
how many are required to implement the expression
(x ∧ y ) ∨ z.
A. 1
B. 2
C. 3
D. 6
E. 8
[2 marks]
x
y y c
r r
Page 8 of 32
We can show why the implementation is valid (i.e., produces a result matching AND)
by inspection:
x y r
0 0 x =0
0 1 x =0
1 0 y =0
1 1 y =1
Notice that x = 0 implies the multiplexer selects the top input and hence r = x, whereas
x = 1 implies the multiplexer selects the top input and hence r = y ; overall, r clearly
matches AND in the sense r = 1 if x = 1 and y = 1. Using the same approach, we can
implement OR as follows
y x
y c
r r
x y r
0 0 y =0
0 1 y =1
1 0 x =1
1 1 x =1
Q8. Consider the combinatorial logic design as shown in Figure 4, which is described using
N-type and P-type MOSFET transistors. Within the design, three inputs (i.e., x, y , and
z) and one output (i.e., r ) can be identified; note that several transistors (e.g., m0 )
and intermediate signals (e.g., t0 ) are annotated for reference. Which of the following
Boolean expressions
A. ¬x
B. ¬((x ∨ y ) ∧ z)
C. (¬(x ∨ y )) ∧ z
D. ¬(x ∧ y ∧ ¬z)
E. ¬(x ∨ y ∨ ¬z)
does the design implement?
[2 marks]
Solution: This question can be approached in several ways. First, one could employ
basic pattern matching: read from left-to-write, the three dominant structures can be
matched against known NAND, NOR, and NOT gate implementations. As such, the
design implements the expression
r = ¬((x ∧ y ) ∨ z)
¬((x ∧ y ) ∨ z)
= ¬((¬(x ∧ y )) ∨ z) (NAND)
= ¬(¬((¬(x ∧ y )) ∨ z)) (NOR)
= ¬(x ∧ y ∧ ¬z) (deMor gan)
such that
r = ¬(x ∧ y ∧ ¬z).
Second, although it involves more work, one can enumerate the transistor and signal
states for each input combination. For example, using + (resp. −) to denote where a
given transistor is connected or activated (resp. disconnected or deactivated), we can
write
x y z m0 m1 m2 m3 t0 m4 m5 m6 m7 t1 m8 m9 r
0 0 0 + + − − 1 − − + + 0 + − 1
0 0 1 + + − − 1 − + + − 0 + − 1
0 1 0 − + + − 1 − − + + 0 + − 1
0 1 1 − + + − 1 − + + − 0 + − 1
1 0 0 + − − + 1 − − + + 0 + − 1
1 0 1 + − − + 1 − + + − 0 + − 1
1 1 0 − − + + 0 + − − + 1 − + 0
1 1 1 − − + + 0 + + − − 0 + − 1
r = ¬(x ∧ y ∧ ¬z)
directly.
Solution: Classifying any ISA in absolute terms can be difficult, and open to opinion and
debate. However, this particular case is fairly clear. In short, this is a load instruction
which uses a fairly complex addressing mode. Rather than including instructions with
simple semantics which can be used as a “building‘block” to realise more complex
functionality, that functionality is combined into this single instruction. For example the
same functionality could realised via
GPR[t] ← MEM[GPR[y ]]
GPR[t] ← GPR[t] + GPR[z]
GPR[x] ← MEM[GPR[t]]
instead, and hence a simpler indirect addressing mode. However, that same complexity
allows a higher code density because fewer instructions (i.e., 1 vs. 3) are required.
Finally, note that the instruction performs two memory accesses. Since the value loaded
by the first is used as an address in the second, these accesses must be done in sequence;
this fact would suggest the instruction has a multi-cycle execution latency. Overall this
suggests that the ISA, or this instruction at least, is best classified as CISC.
Q10. Consider the sequential logic design as shown in Figure 5, which contains two D-type
flip-flops. Within the design, one output (i.e., r ) can be identified; note that several
intermediate signals (e.g., t0 ) are annotated for reference. If the clock signal clk has a
frequency of 400MHz, what is the frequency of r ?
A. 100MHz
B. 200MHz
C. 400MHz
D. 800MHz
E. 1600MHz
[3 marks]
x y r
0 0 0
0 1 1
1 0 1
1 1 0
clk
t0
t1
t2
t3
r
Put simply, this suggests that each toggle flip-flop acts to halve the frequency: t1
toggles at half the frequency of clk and t3 toggles at quarter the frequency of clk.
Given that clk has a frequency of 400MHz, we therefore expect r = t3 to toggle with
a frequency of 400
4 = 100MHz.
Q11. Consider the design as shown in Figure 6, which implements a simple Finite State
Machine (FSM) using D-type latches and a 2-phase clock. Note that the r output
reflects whether the FSM is in an accepting state, the r st input resets the FSM into the
start state, and the Xi input drives transitions between states: the idea is that the i -th
element of a sequence
X = hX0 , X1 , . . . , Xn−1 i
is provided as input, via Xi , in the i -th step. Assuming the entirety of X is consumed,
which of the following
A. r = X0 ⊕ X1 ⊕ · · · ⊕ Xn−1
B. r = X0 ∧ X1 ∧ · · · ∧ Xn−1
C. r = X0 ∧ X1 ∧ · · · ∧ Xn−1
D. r = X0 ∨ X1 ∨ · · · ∨ Xn−1
E. r = X0 ∨ X1 ∨ · · · ∨ Xn−1
best describes the output from, or functionality of the FSM?
[3 marks]
Solution: Basically this question is asking us to reverse engineer the FSM implementa-
tion into a design and hence functionality; to do that, we can step backwards through
the process we have that would normally step forwards.
The first step is therefore be to inspect the implementation and extract pertinent fea-
tures: 1) the bottom and top D-type latches capture 1-bit current and next states,
i.e., Q and Q0 , respectivly, 2) between the two we can identify an output function
r = ω(Q) = ¬Q and a transision function Q0 = δ(Q, r st) = (¬r st) ∧ (¬Xi ∨ Q). Note
that we can classify this as a Moore-type FSM, since the output r is determined by the
current state Q alone.
The next step is to reconstruct a concrete, tabular description of the FSM, i.e., a truth
table, using ω and δ:
δ ω
r st Xi Q Q0 r
0 0 0 1 1
0 0 1 1 0
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 0 0
1 1 0 0 1
1 1 1 0 0
Because Q and Q0 are each represented by a single D-type latch, we can infer the FSM
has (at most) two states. Other assignments are possible provided we are consistent,
but the most natural would be to say Q = 0 7→ S0 and Q = 1 7→ S1 . Given that r st = 1
forces Q0 = 0, we can infer than S0 is an initial state; Given that r = ¬Q and so r = 1
iff. Q = 0, we can infer than S0 is an accepting state.
The next step is to reconstruct a abstract, diagrammatic description of the FSM:
start S0 S1
Xi = 1 Xi = 1
The final step demands some creativity, in the sense that we need to interpret the
functionality realised: although doing so is not trivial, we can approach it by tring to
explain in words what the FSM does step-by-step. For example, note that the FSM
starts in state S0 and stays there while the input is Xi = 1. However, as soon as it
encounters an input st. Xi = 0 it will transistion to state S1 : it stays there whether the
input is Xi = 0 or Xi = 1. So, put another way, the FSM
r = X0 ∧ X1 ∧ · · · ∧ Xn−1 .
Q12. Figure 7 outlines, at a high level, a 4-register counter machine implementation; Figure 8
completes said implementation, detailing internals of the decoder component. Note that
the multiplexer inputs should be read left-to-right, and use zero-based indexing. Using
the left-most multiplexer in the decoder as an example, if the 3-bit control-signal derived
from i nst is 001(2) = 1(10) then the 1-st input is selected; this means the output is 2(10) .
Which of the following
A. Li : if R3 = 0 then goto L9 else goto Li+1
B. Li : if R3 + 1 = 0 then goto L9 else goto Li+1
C. Li : R3 ← R3 + 1 then goto Li+1
D. Li : R3 ← 0 then goto Li+1
E. None of the above
describes the semantics of a machine code instruction 100111001(2) for this counter
machine? [3 marks]
as output. Looking then at the data- and control-path to assess how these outputs are
used to execute the instruction, we conclude that
• since jmp = 0(10) , the program counter is incremented as normal (i.e., not set to
tar get, which in fact is unused).
In general then, this instruction will writes 0 into register Raddr ; given addr = 3 here,
ADD r0, r1 r 0 ← r 0 + r 1; set the carry flag (CF) if the result overflowed.
RCR r0 right-shift r 0 through the carry flag (CF): set the new CF to the least-significant
(0-th) bit of the old r 0, right-shift the old r 0 by 1 bit to produce a new r 0, set the
most-significant (7-th) bit of the new r 0 to the old CF.
add r0, r1
rcr r0
Q14. A stack data structure is often used to support subroutine calling. For a subroutine
written using a high level language (such as C), who or what is responsible for allocating
space on the stack for any local variables used?
A. The programmer.
B. The compiler.
C. The loader.
D. The linker.
E. The special registers.
[1 mark]
What is the average number of clock cycles per instruction (CPI) for this program?
A. 2.40
B. 14.12
C. 2.08
D. 43.53
E. 0.17
[2 marks]
Solution: The CPI is computed as the weighted average of the instruction cycles where
the weights are the frequencies.
4 × 20 + 4 × 15 + 8 × 5 + 1 × 60
CP I = = 2.4
100
Q16. In a given computer system, accesses to main memory by the processor are supported
by a 16 KiB (16 384 = 214 bytes) direct mapped cache. Memory addresses are 16 bits,
and each addressable element has a word size of 4 bytes. Cache blocks are of size
64 bytes. Locations in the cache are numbered starting at 0.
Consider the memory address 0101010110101010 (decimal 21 930) where the least-
significant bit is on the right-hand side.
Which location in cache is data stored at this address placed, and what is the value in
the Tag store?
A. Location 01011010 (decimal 90), Tag store 0101.
B. Location 10101010 (decimal 170), Tag store 01010101.
C. Location 1010 (decimal 10), Tag store 01010101.
D. Location 01011010 (decimal 90), Tag store 1010.
E. Location 10101010 (decimal 170), Tag store 0101.
[3 marks]
Solution: Cache blocks are of size 26 bytes, but the memory is addressable in words of
4 bytes. Therefore, there are 26 /22 = 24 words per cache line; hence the lowest 4 bits
of the memory address are used to index within a cache line. Note, the lowest 4 bits of
the memory address are safe to ignore in this question.
There are 214 /64 = 214 /26 = 28 locations in cache. Therefore, 8 bits of the memory
address are used to determine the location. They are the 5th–12th least-significant
bits, inclusive.
Finally, the remaining 4 most-significant bits are stored in the Tag.
Q17. Compute the two’s complement addition of the following two 8-bit numbers:
• 01100100
• 10111000
Page 18 of 32
01100100
10111000
--------
00011100 result
11100000 carry out
int i, j;
int A[80];
for (i = 0; i ¡ 8; ++i) –
for (j = 0; j ¡ 10; ++j) –
int k = i * 10 + j;
A[k] = j * 10 + i;
˝
˝
int i, j;
int A[80];
for (i = 0; i ¡ 8; ++i) –
int l = i * 10 + 0;
int m = 0 * 10 + i;
Now, applying it to the i loop. Note that in the above m == i, so we don’t need to do
anything to it, and the optimisation is only applied to l.
int i, j;
int A[80];
int p = 0;
for (i = 0; i ¡ 8; ++i) –
int l = p;
int m = i;
for (j = 0; j ¡ 10; ++j) –
A[l] = m;
l = l + 1;
m = m + 10;
˝
p = p + 10;
˝
There are therefore 10 × 2 additions in the j loop. Each iteration of the i loop performs
the 20 additions for the j loop and 1 more, for a total of 8 × 21 = 168 additions, and
zero multiplications.
Q19. Which of the following might an ISA include to support system calls:
A. Timing interrupts.
B. Hardware interrupts.
C. Kernel scheduler.
D. Software interrupts.
E. Bus.
[1 mark]
Page 20 of 32
Solution: A test of comprehension of the OS lecture, which covered all these topics.
The correct answer is software interrupts, as the processor needs a mechanism to pause
execution to refer to the OS on demand.
Question Q20, Question Q21 and Question Q22 all refer to the following computer system:
The system supports up to 232 bytes (4 GiB) of byte-addressable virtual memory. There are
229 bytes (512 MiB) of physical memory installed. The memory is split in pages of size 1 KiB
(210 bytes).
Solution: There are 232−10 = 222 pages, and so the Page Table needs this many entries.
Solution: We can store 229−10 = 219 pages in physical memory, and so each entry of
the page table stores 19 bits to identify the physical addresses.
Q22. The ISA for this computer system uses 32-bit memory addresses. Which bits are used
unmodified in the translated physical address?
A. The 19 least-significant bits.
B. The 10 least-significant bits.
C. The 10 most-significant bits.
Q23. During the assembly of an assembly program into machine code, labels are resolved.
Consider the following Hex 8 assembly program:
Line 1 - BR L1
Line 2 - L2
Line 3 - DATA 1
Line 4 - L3
Line 5 - DATA 1
Line 6 - L1
Line 7 - LDBM L2
Line 8 - LDAM L3
Line 9 - STAM L2
Line 10 - ADD
Line 11 - STAM L3
Line 12 - LDBC 8
Line 13 - SUB
Line 14 - BRN L1
During label resolution, the labels on lines 7, 8, 9, 11, and 14 are resolved. What constant
values are they replaced with, respectively? Assume no extra instructions are created.
Recall the Hex 8 execution cycle: Fetch, Increment program counter (P C), Execute.
You may assume that the P C is set to the line number when that line is fetched.
A. 2, 4, 2, 4, -8
B. 3, 5, 3, 5, 8
C. -4, -3, -6, -6, -8
D. 2, 4, 2, 4, 8
E. 3, 5, 3, 5, -8
[2 marks]
Page 22 of 32
Solution: This Hex 8 program generates the Fibonacci numbers in memory up to 8.
L1 refers to the instruction on Line 7. L2 refers to the instruction on Line 3. L3 refers
to the instruction on Line 5.
The LDAM, LDBM, and STAM instructions all take immediate values for the label. Hence,
L2 = 3 on lines 7 and 9, and L3 = 5 on lines 8 and 11.
The BRN instruction updates the P C, and so the relative offset must be used. The P C
is incremented after fetch, and before execute, so the P C is one greater than the line
number. Hence, L1 resolves to 7 − 15 = −8 on Line 14.
Q24. A bus is used to communicate data between the CPU and memory, which operate on
different clocks frequencies. A number of signal/control wires are used to implement a
communication protocol between the CPU and memory.
The CPU requests data from memory by sending the appropriate signals for a read. How
does CPU know that the data has arrived and it is safe to copy from the bus?
A. The data has arrived after the memory signals is it ready.
B. The data has arrived after a fixed number of CPU cycles.
C. The data has arrived after a fixed number of memory cycles.
D. The data has arrived after the CPU signals it is ready.
E. The data arrives after a fixed number of both CPU and memory cycles.
[1 mark]
Solution: This question tests the different situations where synchronous and asyn-
chronous buses are used, and how they work at a high level. The key detail is that
the CPU and memory have different clocks, and so an asynchronous bus must be used,
which operates on control signals rather than against the clock. Therefore, the data
arrives once the memory tells the CPU it’s on the bus.
Q25. Many ISAs, including x86 and Arm, provide extensions to support SIMD vector instruc-
tions. These instructions perform the same operation on each element of the vector.
Two input programs are written, both performing the same total number of computations
(arithmetic operations). One uses SIMD vector instructions and the other does not.
Which of these statements is true?
A. The program which uses vector instructions results in fewer instructions to be
fetched and decoded.
B. The program which uses vector instructions results in more instructions to be
fetched and decoded.
Q26. A ripple-carry full adder can be turned into an adder-subtractor by converting one of the
inputs to two’s complement. The control signal is sent to the first carry-in bit of the first
1-bit full-adder. Which logic gate can be used to combine each bit of one of the inputs
with the control signal to complete the conversion in order to compute subtraction?
A. AND gate.
B. OR gate.
C. XOR gate.
D. NOT gate.
E. NAND gate.
[1 mark]
Page 24 of 32
Additional figures and tables
w
x
00 01 11 10
y
00 1 0 1 1
x 0 1 5 4
00 01 11 10
01
2
0 3
0 7
0 6
0
z
0
0
1 1
0 5
1 4
1 11
10
0 11
0 15
0 14
0
y
w 1
2
0 3
0 7
0 6
1 10
8
1 9
0 13
0 12
1
A. B.
w w
x x
00 01 11 10 00 01 11 10
00
0
1 1
0 5
1 4
1 00
0
1 1
0 5
1 4
1
01
2
0 3
0 7
0 6
1 01
2
0 3
0 7
0 6
1
z z
11
10
0 11
0 15
0 14
0 11
10
0 11
0 15
0 14
0
y y
10
8
1 9
0 13
0 12
1 10
8
1 9
0 13
0 12
1
C. D.
w
x
00 01 11 10
00
0
1 1
0 5
1 4
1
01
2
0 3
0 7
0 6
1
z
11
10
0 11
0 15
0 14
0
y
10
8
1 9
0 13
0 12
1
E.
Li : if Raddr = 0 then goto Ltarget else goto Li+1 7→ 010 addr target
8 7 6 5 4 3 2 1 0
Page 26 of 32
31
26
25
21
20
15
11
10
0
opcode rs rt rd shamt funct R-type
opcode rs rt imm I-type
opcode imm J-type
Vdd
m0 m4
m1 m5 m8
t1
r
t0
m2 m6 m9
m3 m7
Vss
Figure 4: A combinatorial logic design, described using N-type and P-type MOSFET transis-
tors.
Page 28 of 32
t0 t1 t2 t3
1 D Q D Q r
en en
¬Q ¬Q
clk
Figure 5: A sequential logic design, containing two D-type flip-flops.
rst
D Q
φ1 en
¬Q
r Xi
Figure 6: Implementation of a simple FSM, using D-type latches and a 2-phase clock.
Page 30 of 32
Q Q
R0 PC0
rst D en rst D en
rst rst
¬halt ∧ Φ2 ¬halt ∧ Φ1
op jmp ∧ cmp
target
op
wr
? addr
0 +1 −1 =0 cmp +1 decoder target
jmp
halt
inst
addr
Page 31 of 32
Q Q Q Q Q addr
R0 R1 Rr−1 PC MEM IR
rst D en rst D en rst D en rst D en rst D en data
rst rst
¬halt ∧ Φ1 ¬halt ∧ Φ2
addr
wr
Turn Over/. . .
Figure 7: The high-level data- and control-path for an example 4-register counter machine.
halt
jmp
merge
target
inst3,...,0
merge
inst5,...,4
addr
merge
inst8,...,6
11001000
wr
merge
inst8,...,6
12000000
op
inst8
inst7
inst6
inst5
inst4
inst3
inst2
inst1
inst0
Figure 8: The low-level decoder implementation for an example 4-register counter machine.
Page 32 of 32
END OF PAPER