You are on page 1of 21





University of Maryland

Ele tri al and Computer Engineering Department

College Park, MD 20742-3285


Glenn L. Martin Institute of Te hnology

 A. James Clark S hool of Engineering

Dr. Charles B. Silio, Jr.

Telephone 301-405-3668
Fax 301-314-9281
silioeng.umd.edu

The Mi roar hite ture/Mi roprogramming Level

These notes are based on and extend material in Chapter 4 of A. S. Tanenbaum, Stru tured
Computer Organization, 3rd Edition, Prenti e Hall, 1990. The a umulator based ma hine whose
instru tion set ar hite ture is alled the Ma -1 has its data path mi roar hite ture and its mi roprogrammed implementation ( alled the Mi -1) presented here. This presentation di ers from the
sta k oriented IJVM and orresponding Mi -1 in Tanenbaum's 5th Edition textbook.
One of the di eren es between the Mi -1 (mi roprogrammed omputer) presented here and the
one in the 5th Edition textbook is that all registers in this Mi -1 are onstru ted from lo ked (or
gated) D-lat hes, as shown in Fig. 1; whereas, registers in the 5th Edition text use edge-triggered
ip- ops. Fig. 2 shows how an 8-bit register is built using lo ked D-lat hes and three-state (i.e.,
tri-state) bu ers for onne tion to two output buses.
11
00
00
11

1
0

Q
D

CLK

1
0

1
0

CLK

Figure 1: Clo ked D-lat h

11
00
00
11

11
00
00
11

b7
Q
CLK
Load
OE-A

00
11
11
00

b6

CLK

1
0

A-Bus

11
00
00
11
11
00

B-Bus

b4
Q
CLK

1
0
0
1
1
0

0
1
11
00
00
11
1
0
0
1

b3
Q
CLK

1
0
0
1
00
11
0
1

11
00
1
0
0
1
11
00
00
11

1
0
0
1

11
00
00
11
11
00
00
11

11
00
00
11

b2
Q
CLK

00
11
0
1
11
00
0
1

1
0
Q

Q
D

00
11
1
0
1
0
0 00
1
110
110
00
11
00
1
0 11
1
00 11
001

11
00
00
11

OE-B

b5
Q
CLK

11
00
00
11

1
0
Q

Q
D

1
0
0
1

1
0

C-Bus

b1
Q
CLK

b0
Q
CLK

1
0
1
0
10
0
11
00
00
11
1
11
00
0 11
1
00 11
00

1
0
0
1

00
11
11
00
0
1
00
11
0
1

1
0
11
00
00
11

1
0

1
0
0
1

Figure 2: Eight-bit register and bus onne tions

11
00

Registers: A register is a devi e apable of storing information. Con eptually, registers are
the same as main memory, the di eren e being that the registers are lo ated physi ally within
the pro essor itself, so they an be read from and stored into faster than words in main memory,
whi h is usually o - hip. Larger and more expensive ma hines usually have more registers than
smaller and heaper ones, whi h must use main memory for storing intermediate results. On some
omputers a set of registers numbered 0; 1; 2; : : : ; n 1, is available at the mi roprogramming
level and is alled lo al storage or s rat hpad storage.
A register an be hara terized by a single number: namely, how many bits an it hold (e.g.,
Fig. 2 is an 8-bit register). The bits (binary digits) in an n-bit register ould be numbered from
left to right or from right to left. The numbering onvention assumed in these notes for the bits in
an n-bit register is right to left from 0 to n 1 in the natural powers of two order of a positional
number system for integers. In other words, bit 0 is stored in the rightmost D-lat h in Fig. 2 and
bit 7 is stored in the leftmost D-lat h (whi h orresponds to bit n 1 when n = 8).
Information pla ed in a register remains there until some other information repla es it. The
pro ess of reading information out of a register does not a e t the ontents of the register. In other
words, when a register is read, a opy is made of its ontents and the original is left undisturbed in
the register. Similarly, when information is moved from one register to another, a opy is loaded
into the destination register and the ontents of the sour e register remain undisturbed.

A bus is a olle tion of wires used to transmit signals in parallel. For example, buses are
used to allow the ontents of one register to be opied to another one. A bus may be unidire tional or
bidire tional. A unidire tional bus an transfer data only in one dire tion; whereas, a bidire tional
bus an transfer data in either dire tion but not both simultaneously. Unidire tional buses are
typi ally used to onne t two registers, one of whi h is always the sour e and the other of whi h is
always the destination. Bidire tional buses are typi ally used when any of a olle tion of registers
an be the sour e and any other one an be the destination.
Many devi es have the ability to onne t and dis onne t themselves ele tri ally from the buses
to whi h they are physi ally atta hed. These onne tions an be made or broken in nanose onds.
A bus whose devi es have this property is alled a tri-state (or three-state) bus (the term tri-state
being a registered trademark of National Semi ondu tor Corp.). A tri-state bu er ampli er is used
to make the onne tions. These tri-state bu er ampli ers are shown in Fig. 2 as triangular shapes
whose inputs ome from the output of the D-lat h to whi h ea h is onne ted and ea h of whose
outputs is onne ted to a single bus wire. The other input to the bu er ampli er (labeled either
OE-A or OE-B) is a ontrol (or enable) input. If this ontrol input is in the logi zero state, then
the output of its bu er ampli er is in the high-impedan e state (i.e., dis onne ted from the bus
wire to whi h it is atta hed). If the ontrol input is in the logi one state (also alled a tive-high)
then the bu er ampli er's output value equals its input value (either logi 0 or logi 1), and the
D-lat h's output state is onne ted to the orresponding bus wire.
In most mi roar hite tures, some registers are onne ted to one or more input buses and to
one or more output buses. Fig. 2 depi ts an 8-bit register onne ted to one input bus and to two
output buses. The register has three ontrol inputs: namely, Load, OE-A, and OE-B, where OE
stands for \output enable." When \Load" is in the logi zero state, the ontents of the register
are not a e ted by the signals on the C-bus wires. When \Load" is raised to the logi 1 state the
values on the C-bus wires are opied into their orresponding D-lat hes in parallel. After the new
values are lat hed \Load" an be returned to its logi zero state, and the register remembers the
binary value last loaded into it.
When \OE-A" is at the logi zero level, the register is dis onne ted from the A-Bus (and
similarly for \OE-B" with respe t to the B-Bus). When \OE-A" is raised to the logi 1 level, the
register is onne ted to the A-Bus wires (and similarly for \OE-B" with respe t to the B-bus).
In order to transfer data from this register to another register R using the A and C buses. The
input to register R must be onne ted to the C-Bus, and \OE-A" for this register must be raised to
Buses:

the logi 1 level in order to pla e the register's ontents on the A-Bus. Other ir uitry su h as an
arithmeti and logi unit (ALU) whi h is not shown here must then be used to onne t the A-Bus
wires to the C-Bus wires. After a short time to allow the signals on the buses to settle down and
be ome stable then the Load signal onne ted to register R is raised to the logi 1 level and the
information transfer is a omplished.
Be ause drawing all of the wires and lat hes shown in Fig. 2 requires too mu h spa e, a
shorthand s hemati su h as that shown in Fig. 3 is used instead. Fig. 3 depi ts a 16-bit register
that would be onstru ted internally in the same fashion as the 8-bit register shown in Fig. 2 but
with 8 more lat hes, 16 more bu er ampli ers, and 24 more bus wires.
16
16
From C-Bus

16-bit Register

To A-Bus

16
To B-Bus

Load Clk

OE-A

OE-B

Figure 3: Sixteen-bit register s hemati


Cir uits that have one or more input lines and ompute one or
more output values that are uniquely determined by the present inputs are alled ombinational
ir uits. Two important ombinational ir uits are de oders and multiplexers. A de oder has
n output lines numbered 0 to 2n 1. If the binary number on the input lines
n input lines and 2
has de imal value k, then output line number k takes the value 1 and all other output lines take
the value 0. A de oder always has exa tly one output line whose value is set to 1, with all the rest
set to 0. A multiplexer has 2n data inputs (either individual lines or buses), one data output of
the same width as the inputs, and an n-bit ontrol input that is internally de oded to sele t one of
the inputs and route it to the output. The stru ture of a 2 to 1 multiplexer is shown in Fig. 4. If
instead of a single input line one wishes to swit h the ontents of one of two n-bit input buses to
an n-bit output bus, then one must use n 2 to 1 multiplexers (one per output bit line) all sele ted
by the same sele tion input value S .
De oders and Multiplexers:

I1
Z

I1
2 to 1
MUX

I0

I0

1
0

S
S

Figure 4: 2 to 1 Multiplexer (one for ea h output bit when used with registers)
An Example Mi roar hite ture

The data path of our example mi roar hite ture is shown in Fig. 5. The data path is that
part of the entral pro essing unit (CPU) that ontains the arithmeti and logi unit (ALU) and
its inputs and outputs. In this ase it ontains 16 identi al 16-bit registers, labeled PC, AC, SP,
and so on, that form a s rat hpad memory a essible only to the the mi roprogramming level.
The registers labeled 0, +1, and -1 will be used to hold the indi ated onstants (with -1 in two's
omplement form). The meaning of the other register names will be explained later. Ea h register
an output its ontents onto one or both of two internal buses, the A-Bus and the B-Bus, and ea h
an be loaded from a third internal bus, the C-Bus as shown in the gure.
3

C-Bus

A-Bus

B-Bus

CPU reg. adr.

Enc

T4

C-Bus
Decoder

16

PC

AC

SP

IR

TIR

+1

-1

4-Phase
Clock
Generator

T4 T3 T2 T1

16
4
C
Field

16
8

AMASK

SMASK

10

11

12

13

14

15

A-Bus
Decoder
A
Field

B-Bus
Decoder
4

B
Field

To

MMUX

Main
Memory

4095

B-Latch

A-Latch

C1
C0

MAR
MBR
I1

Wr

Mbr

Rd
T4

AMUX

T2

I0

Mar

Micro
Sequencing
Logic

F1
F0

T3
ALU

Amux

Shifter

N
Z

S1
S0

Figure 5: The data path for example mi roar hite ture (Mi 1/Ma 1)

The A and B buses respe tively feed the left and right inputs of a 16-bit wide ALU that an
perform four fun tions: addition (A + B), bitwise logi al AND (A.AND.B), left input straightthrough (A), and bitwise logi al omplement (i.e., 1's omplement) of the ontent of the left input
(NOT A). The fun tion to be performed is spe i ed by the two ALU ontrol lines F1 and F0 . The
ALU generates two status bits based on the urrent ALU output: N, whi h takes the value 1 when
the ALU output is negative, and Z, whi h takes the value one when the ALU output is zero. The
N bit is just a opy of the high-order (bit position 15) output bit. The Z bit is the NOR of all the
ALU output bits (namely, bits 0 through 15).
The 16-bit ALU output goes into a shifter, whi h is a ombinational ir uit that an logi ally
shift its input 1 bit left or right, or not at all, and gate the result to its 16-bit output. The fun tion
to be performed by the shifter is spe i ed the the two shifter ontrol lines S1 and S0 . It is possible
to perform a 2-bit left shift of a register, R, by using the ALU to ompute R + R (whi h is a 1-bit
left shift) and then shifting this sum another bit left using the shifter.
The A-Bus de oder is used to de ode a 4-bit register designator (A- eld) that sele ts one of
the 16 s rat hpad registers to be gated onto the A-Bus. The outputs of the de oder are 16 output
enable (OE-A) signals (one for ea h register) and one and only one of the OE-A signals takes the
value 1. The B-Bus de oder is used to de ode a 4-bit register designator (B- eld) that sele ts one
of the 16 s rat hpad registers to be gated onto the B-Bus. The outputs of the de oder are 16
output enable (OE-B) signals (one for ea h register) and one and only one of the OE-B signals
takes the value 1. The C-Bus de oder is used to de ode a 4-bit C- eld register designator that
sele ts the s rat hpad register to be loaded from the C-Bus. The outputs of the C-Bus de oder
are 16 load lo k signals (one for ea h register). Be ause all 16 possible C- eld values are assigned
to the 16 registers, an additional ontrol input is needed to prevent loading any of the registers.
This additional ontrol input is ENC (for enable-C). If ENC = 0, then all 16 of the de oder's load
outputs remain at logi level zero, and none of the registers is overwritten. If ENC = 1, then
one and only one of the destination registers sees a load lo k line = 1 at the appropriate time
determined by yet another ontrol input alled T4.
Neither the A-Bus nor the B-Bus feeds the ALU dire tly. Instead, ea h one feeds a lat h (i.e.,
a register) that in turn feeds the ALU. The lat hes are needed be ause the ALU is a ombinational
ir uit { it ontinuously omputes the output for the urrent input and fun tion ode. Feeding
the left and right ALU inputs dire tly from the A and B buses (without the additional lat hes)
an ause ra e problems. For example, onsider assigning to the destination register A the sum
of the ontents of registers A and B, denoted A:= A + B. As A is being written into, the value
on the A-Bus begins to hange, whi h auses the ALU output and thus the ontents of the C-Bus
to hange as well. Consequently, the wrong value may be stored into A. In other words, in the
assignment A:= A + B, the old A on the right-hand side is the original A value, not some bit-by-bit
mixture of the old and new values. By inserting lat hes (namely, the A-lat h and B-lat h) into the
A and B buses, we an freeze the original A and B values there early in the y le, so that the ALU
is shielded from hanges on the buses as the new value is being stored into the s rat hpad.
One an think of the A-lat h and the B-lat h as shared slave lat hes for the orrespondingly
sele ted sour e master lat hes in the s rat hpad. This saves using slave lat hes in ea h s rat hpad
register that using master-slave ip- ops to build the registers would require, but it ompli ates the
timing somewhat. The A-lat h and B-lat h are loaded by timing ontrol signal T2 that is generated
by a 4-phase lo k generator ir uit shown in Fig. 6.
Computer ir uits are normally driven by a lo k, a devi e that emits a periodi sequen e of
pulses. These pulses de ne ma hine y les. During ea h ma hine y le, some a tivity o urs, su h
as the exe ution of a mi roinstru tion. It is often useful to divide a y le into sub y les so di erent
parts of the mi roinstru tion an be performed in a well-de ned order. For example, the inputs to
the ALU must be made available and allowed to be ome stable before the output an be stored.

T1

Run/Stop

Reset

Four Phase
Clock Generator
Finite State
Machine

T2
T3
T4

Master Clock
Pulses

11
00
00
11
00
11
00
11
00
11
00
11

Master Clock

T1

11
00
00
11
00
11
00
11
00
11
00
11

T2

11
00
00
11
00
11
00
11
00
11
00
11

T3

11
00
00
11
00
11
00
11
00
11
00
11
Subcycle

T4

11
00
00
11
00
11
00
11
00
11
00
11

CPU Cycle
(Repeats while Run = True)

Figure 6: Four Phase Clo k Cy le


Main Memory: Pro essors need to be able to read data from memory and write data to
memory. Most omputers have an address bus, a data bus, and a ontrol bus for ommuni ation
between the CPU and memory. To read from memory, the CPU puts a memory address on the
address bus and sets the ontrol signals appropriately, for example by asserting \Rd" (READ). The
memory then puts the requested item on the data bus. In some omputers memory read/write is
syn hronous; that is, the memory must respond within a xed time. This is what we assume for
our mi roar hite ture; namely, the memory must respond within four lo k (sub y le) ti ks. On
other omputers, the memory may take as long as it wants, signaling the presen e of data using a
(e.g., READY or memory fun tion omplete) ontrol line when it is nished.
Writes to memory are done similarly. The CPU puts the data to be written on the data bus and
the address to be stored into on the address bus and then it asserts \Wr" (WRITE). (An alternative
to having \Rd" and \Wr" is to have MREQ, whi h indi ates that a memory request is desired, and
R/W, whi h distinguishes read from write. In either ase two ontrol lines are required.)
On most ma hines (ex ept for our example) a memory a ess is nearly always onsiderably longer
than the time required to exe ute a single mi roinstru tion. Consequently the mi roprogram must
keep the orre t values on the address and data buses for several mi roinstru tions (i.e., ma hine
y les). To simplify this task, it is often onvenient to have two registers, the MAR (Memory
Address Register) and the MBR (Memory Bu er Register), that drive the address and data buses,
respe tively. Both registers sit between the CPU and the system's memory bus. The address bus
is unidire tional on both sides and is loaded from the CPU side when the \Mar" ontrol line is

asserted. The output to the system address lines is always enabled (as is the ase here) [or possibly
only during reads and writes, whi h requires an output enable line driven by the OR of \Rd" and
\Wr" (not shown). Be ause the main memory in our example mi roar hite ture has only 4096
16-bit words, the MAR is a 12-bit register. In our example mi roar hite ture the MAR is onne ted
to the B-Bus (rather than to the C-Bus) so that both the MAR and the MBR an be loaded in
the same ma hine y le (i.e., loaded by the same mi roinstru tion) Be ause the MAR is onne ted
to the B-Bus (rather than to the C-Bus) and be ause of the way the mi roprogram ontrolling
this ma hine is written, it is restri ted in size to 12-bits. To allow for a 16-bit MAR onne ted
in this way would require two ma hine y les (i.e., two mi roinstru tions) and the use of another
s rat hpad register to properly load all 16 bits into the MAR. This is a onsequen e of the design
hoi es made by others; namely, the author of the text from whi h this example is derived. A more
detailed s hemati of the MAR and its onne tions is shown if Fig. 7.
B-Bus

12
To Memory
Address
Decoder

Memory Address Register


MAR
12-bits
Load
OE=1

16

Low Order
12

MAR
T3
MAR
(Control Bit)

Figure 7: Memory Address Register (MAR)


As shown in Fig. 8, the \Mbr" ontrol line auses the MBR to be loaded from the C-Bus on
the CPU side. The MBR output is always enabled on the CPU side and is presented to a 2 to 1
multiplexer (the AMUX) that swit hes the input to the left ALU input between the A-lat h and
the MBR under ontrol of a signal alled Amux. If Amux = 0, the left input of the ALU sees the
ontents of the A-lat h and if Amux = 1, it sees the ontents of the MBR. The system's memory
data bus is bidire tional, and the \Rd" and \Wr" ontrol signals are used to determine its dire tion
between memory and the MBR (to memory on write and from memory on read).
16
16
To/From
16

Memory

11
00
00
11
To11
00
00
11

1
0

I1
Memory Buffer Register
2:1
MBR
MUX Z
16-bits
(x16) 16
I0
Load MBR
S
T4
OE=1

To AMUX

MBR
0
1
Control
00
11
00 RD
11

11
00
00
11

Bit

(Read)
WR
(Write)

Memory

16

C-Bus
(From Shifter)

Figure 8: Memory Bu er Register (MBR)

To ontrol the data path of our mi roar hite ture in Fig. 5 requires 60 signals that belong to
the following nine fun tional groupings:
16 signals to ontrol loading the A-Bus from the s rat hpad
16 signals to ontrol loading the B-Bus from the s rat hpad
16 signals to ontrol loading the s rat hpad from the C-Bus
1 signal to ontrol loading the A and B lat hes
2 signals to ontrol the ALU fun tion
2 signals to ontrol the shifter
4 signals to ontrol the MAR and MBR
2 signals to indi ate memory read or memory write
1 signal to ontrol the AMUX
Given the values of the 60 signals, we an perform one y le of the data path. A y le onsists of
gating values onto the A and B buses, lat hing them in the two bus lat hes, running the values
through the ALU and shifter, and nally storing the results in the s rat hpad and/or the MBR. In
addition, the MAR an also be loaded, and a memory y le initiated. As a rst approximation we
ould have a 60-bit ontrol register, with one bit for ea h ontrol signal. A 1 bit means that the
signal is asserted and a 0 means that it is not asserted (i.e., negated). We also need a multiphase
lo k generator ir uit to ontrol when things happen during the y le.
However, at the pri e of a small in rease in ir uitry, we an greatly redu e the number of bits
needed to ontrol the data path. Using all 16-bits to ontrol the A-Bus would allow 216 ombinations
of signal values, only 16 of whi h are valid be ause the s rat hpad has only 16 registers. Therefore,
we an en ode the A-Bus ontrol information in a 4-bit eld and use a de oder to generate the 16
ontrol signals. The same holds true for the B-Bus.
The situation is slightly di erent for the C-Bus. In prin iple, multiple simultaneous stores
into the s rat hpad are feasible, but in pra ti e this feature is only infrequently useful, and most
hardware designs do not provide for it. Therefore, we will also en ode the C-Bus ontrol into a 4
bit eld. Having en oded some of the ontrol signals into elds and in turn supplied orresponding
de oder ir uits, we have saved 3  12 bits. We now need only 24 bits to ontrol the data path.
Be ause the A and B lat hes are always loaded at a ertain point in time, we an supply a multiphase
lo k generator ir uit and use one of the lo k phases (say T2) as this ontrol input, leaving 23
ontrol bits needed. After the values in the A and B lat hes settle down the MAR an be lo ked
(at sub y le time T3) to opy the ontent of the B lat h if the \Mar" ontrol bit is set to 1. More
time is needed, however, for the data signals to propagate through the ALU and shifter ir uitry
before they have settled down and an be opied from the C-Bus into their destination(s) in the
s rat hpad or MBR at sub y le time T4. One additional signal that is not stri tly required, but is
often useful, is one to enable/disable storing the C-Bus into the s rat hpad. In some situations one
merely wishes to perform an ALU operation to generate the N and Z signals, but does not wish to
store the result. With this extra bit, whi h we will all ENC (ENable C), we an indi ate that the
C-Bus ontents are to be stored (ENC = 1) or not (ENC = 0).
With ENC in luded we an ontrol the data path with a 24 bit number. Now we note that \Rd"
and \Wr" an be used to ontrol the lat hing of the MBR from the system's memory data bus and
the enabling of the MBR onto it, respe tively (as shown in Fig. 8). This observation redu es the
number of independent ontrol signals needed from 24 down to 22.
8

The next step in the design of the mi roar hite ture is to invent a mi roinstru tion format
ontaining 22 bits. Fig. 9 shows su h a format with two additional elds COND and ADDR, whi h
will be des ribed shortly. The mi roinstru tion ontains 13 elds, 11 of whi h are as follows:
AMUX { ontrols left ALU input: 0 = A-lat h, 1 = MBR
ALU
{ ALU fun tion: 0 = A + B, 1 = A.AND.B, 2 = A, 3 = A
SHFT { shifter fun tion: 0 = no shift, 1 = right shift, 2 = left shift
MBR
{ loads MBR from shifter: 0 = don't load MBR, 1 = load MBR
MAR
{ loads MAR from B-lat h: 0 = don't load MAR, 1 = load MAR
RD
{ requests memory read: 0 = no read, 1 = load MBR from memory
WR
{ requests memory write; 0 = no write, 1 = write MBR to memory
ENC
{ ontrols storing into s rat hpad: 0 = don't store, 1 = store
C
{ sele ts register for storing into if ENC = 1: 0 = PC, 1 = AC, et .
B
{ sele ts B-Bus sour e: 0 = PC, 1 = AC, 2 = SP, 3 = IR, et .
A
{ sele ts A-Bus sour e: 0 = PC, 1 = AC, 2 = SP, 3 = IR, et .
Microinstruction Format (32-bit word)
Number of bits in each field:
1
A
M
U
X

2
2
C
A
O
L
N
U
D

2
S
H
F
T

1 1 1 1 1

M M
E
B A R W N
R R D R C

8
ADDR

C1 C0 F1 F0S1 S0

AMUX

COND

0 = A-latch
1 = MBR

C1 C0

0
0
1
1

0
1
0
1

=
=
=
=

No Jump
Jump if N=1
Jump if Z=1
Jump always

SHFT

ALU
F1 F0

0
0
1
1

0
1
0
1

=
=
=
=

A + B
A and B
A
A

S1 S0

0
0
1
1

0
1
0
1

=
=
=
=

No shift
Shift right 1 bit
Shift left 1 bit
(not used)

MBR, MAR, RD, WR, ENC


0 = No
1 = Yes

Figure 9: Mi roinstru tion Format (32-bits) for Mi -1/Ma -1 mi roar hite ture
The ordering of the elds is ompletely arbitrary. This ordering has been hosen to minimize line
rossings in a subsequent gure. (A tually, this riterion is not as razy as it sounds; line rossings
in gures usually orrespond to wire rossings on printed ir uit boards or on integrated ir uit
hips, whi h ause di ulties in two-dimensional designs.)
Mi roinstru tion Timing: Although our dis ussion of how a mi roinstru tion an ontrol
the data path during one y le is almost omplete, we have mostly negle ted one issue up until
now: timing. A basi ALU y le onsists of setting up the A and B lat hes, giving the ALU and
shifter time to do their work, and storing the results. It is obvious that these events must happen
in that sequen e. If we try to store the C-Bus ontents into the s rat hpad before the A and B
lat hes have been loaded, garbage will be stored instead of useful data. To a hieve the orre t event
sequen ing, we use a four-phase lo k, that is a lo k with four sub y les, as shown in Fig. 6. The
key events during ea h of the four sub y les are as follows:
1. Load the next mi roinstru tion to be exe uted into a register alled MIR, the Mi roInstru tion Register.
9

2. Gate sele ted s rat hpad registers onto the A and B buses and apture them in the A and B
lat hes.
3. Now that the inputs are stable, give the ALU and shifter time to produ e a stable output
and load the MAR if required.
4. Now that the shifter output is stable, store the C-Bus ontents into the s rat hpad and load
the MBR, if either is required.
Fig. 10 presents a detailed blo k diagram of the omplete mi roar hite ture of our example ma hine. It may look imposing initially, but it is worth studying arefully. When you fully understand
every box and every line on it, you will be well on your way to understanding the mi roprogramming level. The blo k diagram has two parts, the data path on the left, whi h we have already
dis ussed in detail, and the ontrol se tion on the right, whi h we will now examine.
The largest and most important item in the ontrol portion of the ma hine is the ontrol store.
This spe ial, high-speed memory is where the mi roinstru tions are kept. On some ma hines it is
read-only memory (ROM); on others it is read/write memory. In our example, mi roinstru tions
are 32 bits wide and the mi roinstru tion address spa e onsists of 256 words, so the ontrol store
o upies a maximum of 256  32 = 8192 bits. By omparison, the Digital Equipment Corporation
(DEC) PDP-11/40 was a popular and ommer ially su essful mi roprogrammed mini omputer in
the mid 1970's that also had a 256 word ontrol store, but its mi roinstru tions were 56 bits wide.
Like any other memory, the ontrol store needs an MAR and an MBR. In this ase we will
all the MAR the MPC (Mi roProgram Counter) be ause its only fun tion is to point to the next
mi roinstru tion to be fet hed from the memory for exe ution. The MBR is just the MIR as
mentioned above. In this mi roar hite ture the ontrol store and the main memory are di erent
entities; the ontrol store holds the mi roprogram and the main memory holds the onventional
ma hine language program.
From Fig. 10 it is lear that the ontrol store ontinuously tries to opy the mi roinstru tion
addressed by the MPC into the MIR. However, the MIR is loaded only during sub y le 1, as
indi ated by the dashed line from lo k output T1 to it. During the other three sub y les of the
lo k, it is not a e ted, no matter what happens to the MPC.
During sub y le 2 (whi h lasts between the rising edge of T1 and the rising edge of T2) the
MIR be omes stable, and the various elds begin ontrolling the data path. In parti ular the A
and B elds sele t the s rat hpad registers to be gated onto the A and B buses, respe tively. The
A and B de oder boxes provide for the 4-to-16 de oding of ea h eld needed to drive the OE-A
and OE-B lines at the s rat hpad registers (see Fig. 3). Clo k signal T2 loads the A and B lat hes,
whi h after their outputs settle, provide stable ALU inputs for all remaining sub y les during the
rest of the y le. While data are being gated onto the A and B buses, the in rement unit in the
ontrol se tion of the ma hine omputes MPC + 1, in preparation for loading the next sequential
mi roinstru tion during the next y le. By overlapping these two oprations, instru tion exe ution
an be speeded up.
In sub y le 3, the ALU and shifter are given time to produ e valid results. The AMUX mi roinstru tion eld determines the left input to the ALU; the right input always omes from the
B-lat h. Although the ALU is a ombinational ir uit, the time it takes to ompute the sum is
determined by the arry-propagation time, not the normal gate delay. The arry-propagation time
is proportional to the number of bits in the word. While the ALU and shifter are omputing, the
MAR is loaded from the output of the B-lat h at T3 if the MAR eld in the mi roinstru tion is 1.

10

16 Load-Reg

16 OE-B

B-Bus
Decoder

16 OE-A

A-Bus
Decoder

A-Bus

C-Bus

C-Bus
Decoder

T4
T3
T2
T1

B-Bus

Run/
Stop

4-Phase
Clock
Generator

Reset

0 PC
1 AC
2 SP
I0
I1
MMUX

16 CPU
Registers
Increment
MPC + 1
15 F

MPC

256 wds X 32 bits Control Store


(ROM, PROM, EPROM, EEPROM)
A-Latch

B-Latch
A
M
U
X

MAR
MBR

MIR
S MM
C
E
A
H BA RW N
O
C
L
D
R
F
N
RR
C
D U T
2

I1
I0
AMUX

ADDR

ALU

N
Z

Micro
Seq.
Logic
2

Shifter

Rd
Wr

Figure 10: The omplete blo k diagram for example mi roar hite ture (Mi -1/Ma -1)

11

During the fourth and nal sub y le, the C-Bus may be stored ba k into the s rat hpad and
MBR, depending on ENC and MBR. The box labeled \C de oder" takes ENC, T4, and the C eld
from the mi roinstru tion as inputs and generates the one (or none) of the 16 register load signals.
Internally it performs a 4-to-16 de ode of the C eld and then ANDs ea h of these 16 signals with
a signal derived from ANDing sub y le 4 line T4 with ENC. Thus, a s rat hpad register is loaded
only if three onditions prevail:
1. ENC = 1.
2. It is sub y le 4 with T4 = 1.
3. The register has been sele ted by the C eld.
The MBR is also loaded during sub y le 4 if MBR = 1.
Mi roinstru tion Sequen ing: The only remaining issue is how the next mi roinstru tion is
hosen. Although some of the time it is su ient just to fet h the next mi roinstru tion in sequen e,
some me hanism is needed to allow onditional jumps in the mi roprogram in order to enable it to
make de isions. For this reason two elds are provided in ea h mi roinstru tion; namely, ADDR,
whi h is the 8-bit address of a potential su essor to the urrent mi roinstru tion, and COND,
whi h determines whether the next mi roinstru tion is fet hed from the ontrol store address that
is one greater than the ontents of the urrent MPC (i.e., MPC + 1) or from the lo ation spe i ed
by the ADDR eld. Every mi roinstru tion potentially ontains a onditional jump. The de ision
to allow for this in the mi roinstru tion format was made be ause onditional jumps are very
ommon in mi roprograms, and allowing every mi roinstru tion to have two possible su essors
makes them run faster than the alternative of setting up some ondition in one mi roinstru tion
and then testing it in the next.
The hoi e of address from whi h the next mi roinstru tion will be fet hed is determined by
the box labeled \Mi ro Sequen ing Logi " during sub y le 4, when the ALU output signals N and
Z are valid. The output of this box ontrols the M multiplexer (MMUX), whi h routes either MPC
+ 1 or ADDR to the MPC (loaded by lo k signal T4) where it will dire t the fet hing of the next
mi roinstru tion. The desired hoi e is indi ated by the setting of the COND eld as follows:

0 = Do not jump: next mi roinstru tion is taken from MPC + 1


1 = Jump to ADDR if N = 1
2 = Jump to ADDR if Z = 1
3 = Jump to ADDR un onditionally
The Mi ro Sequen ing Logi ombines the two ALU bits, N and Z, and the two COND bits C1
and C0 to generate an output that is then used as the sele tion input to the MMUX. The Boolean
expression for generating the sele tion signal (Mmux) is:
Mmux = C1 C0 N _ C1 C0 Z _ C1 C0 = C0 N _ C1 Z _ C1 C0
where \_" means logi al OR. In words, the sele tion ontrol signal to the MMUX is 1 (routing
ADDR to MPC) if C1 C0 is 012 and N = 1, or C1 C0 is 102 and Z = 1, or C1 C0 is 112 . Otherwise,
it is 0 and the next mi roinstru tion in sequen e is fet hed.
Be ause the MAR is loaded at time T3, the memory ontrol unit will not have enough time to
de ode the address spe i ed and either read from or write to it when lo k pulse T4 omes along.
In fa t, during a memory read the MBR will be loaded with garbage by the rst T4 lo k pulse
following the loading of the MAR at T3. Hen e, if a mi roinstru tion starts a main memory read,
by setting \Rd" to 1, it must also have Rd = 1 in the next mi roinstru tion exe uted (whi h may or
12

may not be lo ated at the next ontrol store address). In other words, \Rd" must be set to 1 in two
onse utive mi roinstru tions in order for the MBR to be loaded with orre t data (returning from
main memory) by the se ond T4 lo k pulse following the loading of the MAR at T3. A full four
lo k ti ks ( orresponding to a full mi roinstru tion y le time) are needed for the main memory to
respond with valid data. Thus, the data be ome available two mi roinstru tions after the read was
initiated. If the mi roprogram has nothing else useful to do in the mi roinstru tion following the
one that initiated a memory read (or write), that mi roinstru tion's only task is then to keep Rd =
1 (or for writes Wr = 1). In the same way, a memory write also takes two mi roinstru tion times to
omplete. In the mi roinstru tion initiating the write the MAR is typi ally loaded with the address
into whi h data will be written at lo k pulse T3, and the data to be written are loaded into the
MBR at lo k pulse T4. The main memory again needs four lo k ti ks to de ode the address and
omplete the write. Thus, \Wr" must be set equal to 1 in two onse utive mi roinstru tions (the
one initiating the write and the one following it in time).
An Example Ma roar hite ture, the Ma -1

We now onsider the instru tion set ar hite ture of the onventional ma hine level to be supported by the mi roprogrammed interpreter running on the ma hine of Fig. 10. For onvenien e,
we will all the ar hite ture of the level 2 or 3 ma hine the ma roar hite ture to ontrast it with
level 1, the mi roar hite ture. (We will basi ally ignore level 3 at this point be ause its instru tions
are largely those of level 2 and the di eren es are not important here.) Similarly, we will all the
level 2 instru tions ma roinstru tions. Thus, the normal ADD, MOVE, and other instru tions
of the onventional ma hine level will be alled ma roinstru tions. (The point of repeating this
remark is that some assemblers have a fa ility to de ne assembly-time \ma ros" that are in no way
related to what we mean by ma roinstru tions.) We will sometimes refer to our example level 1
ma hine as Mi -1 and the level 2 ma hine as Ma -1.
Sta ks: A modern ma roar hite ture should be designed with the needs of high-level languages
in mind. One of the most important design issues is addressing. A me hanism must be provided
for saving a urrent address pointer when a pro edure (or fun tion) is alled and then returning
ba k to where it ame from in the alling program when exiting the pro edure. In some high-level
languages these alled pro edures are alled subprograms, subroutines, or fun tions, and we will use
these terms inter hangably. A way of passing parameters to the alled pro edure where the alled
pro edure will know to look for them also must be made available. The alled pro edure itself may
need to allo ate some memory spa e for lo al temporary variables in order to do its work and then be
able to release the allo ated spa e when returning to the alling program. Furthermore, a hardware
me hanism that will onveniently support re ursive alls (i.e., pro edures alling themselves) is also
desirable. Blo k stru tured languages (like Pas al and others) are normally implemented in su h a
way that when a pro edure is exited, the storage it has been using for lo al variables is released.
The easiest way to a hieve this goal is by using a data stru ture alled a sta k.
A sta k is a ontiguous blo k of memory ontaining some data that operates on a last-in
rst-out basis mu h like a sta k of afeteria trays on a spring loaded base. A pointer (usually
implemented by a CPU register) alled the sta k pointer (SP) is used to point to the urrent
top of sta k lo ation in the region of main memory where the sta k is lo ated. Just like with the
afeteria trays, when a new tray is pla ed on the sta k, its weight pushs down on the spring in the
suporting base. Thus, sta ks are sometimes alled push-down sta ks, and the ma hine instru tion
used to pla e a new data item or address on the sta k is usually alled a PUSH instru tion. On
the other hand, the instru tion used to remove the top item from a sta k (and pla e it elsewhere)
is variously alled by di erent manufa turers a POP instru tion or a PULL instru tion. With the
afeteria tray analogy POP likely refers to the spring in the base popping up a not h when the
weight of the top tray is removed. In other ontexts PULL is obviously the opposite of PUSH. In
the ma roar hite ture des ribed here we will in lude the instru tions PUSH and POP for putting
13

data items on the sta k or getting them o the sta k. The register le in Fig. 5 already ontains a
register alled SP that we an use as the sta k pointer register to point to the urrent top of sta k
lo ation in memory. It also has a PC register that we an use as a program ounter to point to
where the next ma hine instru tion will be found in memory. The instru tion CALL will rst push
the ontent of the PC register onto the sta k before jumping o to the alled pro edure. The jump
to the alled pro edure is a omplished by overwriting the PC register with a new value, alled
the target address (or the entry point of the pro edure) and then letting the omputer fet h its
next instru tion for exe ution from there. By rst saving the PC register ontents on the sta k
before overwriting the PC with a new target address, the alled pro edure will be able to return
to the alling program where it left o . The instru tion RETURN, when exe uted by the alled
pro edure, will simply pop the top of sta k entry into the PC register, thus pointing the program
ounter ba k to a lo ation (the return point) in the alling program, and will in e e t ause a
jump ba k to the alling program. The CALL and RETURN instru tions then provide a means
for saving and then restoring the ontents of the PC register using the sta k when entering and
exiting from alled pro edures.
Although one ould name any register in the PUSH and POP instru tions as the sour e of the
data for a push and the destination for the data from a POP, our example ma hine will impli itly
use only the AC register as the sour e of data for a PUSH and the destination for a POP. Now
a PUSH must advan e the sta k pointer by one memory lo ation before writing the ontents of
the AC register into the memory lo ation at the top of the sta k. One ould hoose either of the
following options for how to advan e the sta k pointer: (1) allow the sta k to grow upward from
low memory addresses to high memory addresses by in rementing SP on a PUSH; or (2) allow the
sta k to grow downward from high memory addresses to low memory addresses by de rementing
SP on a PUSH. Intel has hosen option (2) for the 80X86 ar hite tures and so will we. Be ause
the sta k pointer points to the urrent top of sta k lo ation, a PUSH must rst de rement (the
ontents of) SP and then opy the ontents of the AC to the memory lo ation whose address is in
the SP register. A POP will rst opy the ontents of the top of sta k lo ation into the AC register
and then in rement (the ontent of) the SP register.
In order to permit programs to reserve (or delete) spa e on the sta k for temporary lo al variables, instru tions are needed for in rementing (or de rementing) the ontents of the SP register by
variable amounts. Hen e, the instru tion set will have instru tions for in rementing SP (INSP) and
de rementing SP (DESP) whi h allow the level 2 programmer to spe ify the variable amount with
an 8-bit onstant. Furthermore, instru tions for getting at lo al variables or in oming parameters
on the sta k relative to where the SP (or some other register) urrently points are also useful;
thus, instru tions providing a form of sta k relative indexed addressing are also needed so that one
doesn't have to keep moving the sta k pointer to get at these items. In other words, the Ma -1
needs an addressing mode that fet hes or stores a word at a known distan e relative to the sta k
pointer (or some equivalent addressing mode). In the Ma -1 these sta k pointer relative indexed
addressing mode instru tions will be known as load lo al (LODL), store lo al (STOL), add lo al
(ADDL) and subtra t lo al (SUBL); they will allow the level 2 programmer to spe ify a 12-bit
o set (or base) value and, hen e, they will have a memory referen e format.
The Ma roinstru tion Set: The instru tion set (or repertoire) is the set of all instru tions
that the Ma -1 is apable of exe uting. The Ma -1's ar hite ture onsists of a memory with 4096
16-bit words and three registers visible to the level 2 programmer. The registers are the program
ounter (PC), the sta k pointer (SP), and the a umulator (AC) whi h is used for moving data
around, for arithmeti , and for other purposes. Three addressing modes are provided: dire t,
indire t, and lo al. Instru tions using dire t addressing ontain a 12-bit absolute memory address
in their low-order 12 bits; and instru tions using this format are usually alled \memory referen e
instru tions". Indire t addressing allows the programmer to ompute a memory address, put it in
the AC, and then read or write the word pointed at by the ontents of the AC register; this mode
14

is sometimes alled register indire t addressing. Lo al addressing spe i es an o set from where
the SP points, and is used (among other things) to a ess lo al variables. Together, these three
addressing modes provide a simple but adequate addressing system.
MAC-1 Instru tion Repertoire

OpCode
Binary

0000xxxxxxxxxxxx
0001xxxxxxxxxxxx
0010xxxxxxxxxxxx
0011xxxxxxxxxxxx
0100xxxxxxxxxxxx
0101xxxxxxxxxxxx
0110xxxxxxxxxxxx
0111xxxxxxxxxxxx
1000xxxxxxxxxxxx
1001xxxxxxxxxxxx
1010xxxxxxxxxxxx
1011xxxxxxxxxxxx
1100xxxxxxxxxxxx
1101xxxxxxxxxxxx
1110xxxxxxxxxxxx
1111000000000000
1111001000000000
1111010000000000
1111011000000000
1111100000000000
1111101000000000
11111100yyyyyyyy
11111110yyyyyyyy
1111111111111111

OpCode
Hex

0xxx
1xxx
2xxx
3xxx
4xxx
5xxx
6xxx
7xxx
8xxx
9xxx
axxx
bxxx
xxx
dxxx
exxx
f000
f200
f400
f600
f800
fa00
f yy
feyy
ffff

Assembly
Mnemoni

lodd
stod
addd
subd
jpos
jzer
jump
lo o
lodl
stol
addl
subl
jneg
jnze
all
pshi
popi
push
pop
retn
swap
insp
desp
halt

Instru tion

Load dire t
Store dire t
Add dire t
Subtra t dire t
Jump if positive
Jump if zero
Jump
Load onstant
Load lo al
Store lo al
Add lo al
Subtra t lo al
Jump if negative
Jump if nonzero
Call pro edure
Push indire t
Pop indire t
Push onto sta k
Pop from sta k
Return
Swap a , sp
In rement sp
De rement sp
Halt ma hine

Meaning
or A tion

a :=m[x
m[x:=a
a :=a +m[x
a :=a m[x
if a 0 then p :=x
if a =0 then p :=x
p :=x
a :=x (0x4095)
a :=m[x+sp
m[x+sp:=a
a :=a +m[x+sp
a :=a m[x+sp
if a <0 then p :=x
if a 6=0 then p :=x
sp:=sp 1;m[sp:=p ;p :=x
sp:=sp 1;m[sp:=m[a
m[a :=m[sp;sp:=sp+1
sp:=sp 1;m[sp:=a
a :=m[sp;sp:=sp+1
p :=m[sp;sp:=sp+1
tmp:=a ;a :=sp;sp:=tmp
sp:=sp+y (0y255)
sp:=sp y (0y255)
stops fet hing instru tions

xxxxxxxxxxxx is a 12-bit ma hine address (or onstant); in olumn 2 it is alled xxx and in olumn 5 it is
alled x.
yyyyyyyy is an 8-bit onstant; in olumn 2 it is alled yy and in olumn 5 it is alled y.

Figure 11: Table of Ma -1 Instru tions


The Ma -1 instru tion set is shown in Fig. 11. Ea h instru tion ontains an operation ode
(op ode) and sometimes a memory address or onstant. The op ode spe i es the operation to be
performed and is shown in binary in the rst olumn of the table. The 12 x's in the instru tions
having a memory referen e format reserve a 12-bit eld for a memory address (or in the ase
of LOCO a onstant) to be spe i ed by the level 2 programmer. The same is true of the 8 y's
in the INSP and DESP instru tions that reserve an 8-bit onstant eld to be spe i ed by the
level 2 programmer. Column two gives the instru tion en oding in hexade imal shorthand, and
olumn three spe i es the assembly language mnemoni for ea h instru tion's op ode. Although the
assembler program for this instru tion set is ase sensitive and wants to see the ma hine instru tion
15

mnemoni s in all lower- ase letters, we will use upper- ase in this text for emphasis when talking
about spe i instru tions. Column four gives a short des ription of what the instru tion does
and olumn ve spe i es the a tion performed in a register transfer language notation. In olumn
ve, if there is more than one a tion o uring, then ea h part of the a tion sequen e is separated
from the next by a semi olon, and the sequen e of a tions o urs in left to right order. Column
ve spe i es the register transfers and a tions using a pseudo-Pas al language fragment. In these
fragments, \m[x" refers to memory word \x."
LODD loads the a umulator (AC register) from the memory word spe i ed in its low-order
12 bits. LODD thus spe i es dire t addressing; whereas, LODL loads the a umulator from the
word at a distan e \x" from where the SP register points and thus spe i es indexed addressing
with the SP register a ting as an index register. LODD, STOD, ADDD, and SUBD perform four
basi fun tions using dire t addressing, and LODL, STOL, ADDL, and SUBL perform the same
fun tions using indexed (or lo al relative to the SP) addressing.
Five jump instru tions are provided, one un onditional jump (JUMP) and four onditional
ones (JPOS, JZER, JNEG, and JNZE). JUMP always opies its low-order 12 bits into the program
ounter (PC); whereas, the other four do so only if the spe i ed ondition is met.
LOCO loads a 12-bit onstant in the range 0 to 4095 (in lusive) into the AC. PSHI pushes onthe
the sta k the word whose address is present in the AC register. The inverse operation is POPI,
whi h pops a word from the sta k and stores it in the memory word whose address is in the AC
register. PUSHI and POPI thus spe i y register indire t addressing using the impli it AC register
as the holder of the indire t address. PUSH and POP are useful for manipulating the sta k in a
variety of ways. SWAP ex hanges the ontents of AC and SP, whi h provides a way of loading the
SP register with a new value. It is also useful for initializing SP at the start of exe ution. INSP
and DESP are used to hange SP by amounts known at ompile time. Be ause the number of
instru tions to be en oded is more than a 16-bit word with a 12-bit address elds will allow, it has
been ne essary to tradeo bits in the address eld with bits in the op ode eld and use \expanding
op odes" to en ode all of the instru tions. The o sets for INSP and DESP are limited to 8 bits
in the (in lusive) range of 0 to 255. Finally, CALL alls a pro edure, saving the return address on
the sta k, and RETN returns from a pro edure by popping the return address and putting it in
the PC register.
Input/Output: The Ma -1 does not have any expli it input or output instru tions. Instead,
it uses memory-mapped I/O. A read from address 4092 will yield a 16-bit word with the next
ASCII hara ter from the standard input devi e in the low-order 7 bits and zeros in the high-order
9 bits of the AC register. When a hara ter is available in the data register whose address is 4092,
the standard input devi e will set to 1 the high-order bit of the input status register at memory
address 4093. The a tion of loading the ontent of the input data register at memory address
4092 into the AC register lears (i.e., sets to zero) the ontent of ip- ops in the status register at
memory address 4093. The input routine will normally sit in a tight loop waiting for the ontent
of 4093 to go negative. When it does, the input routine will load the AC from 4092 and return.
Output is a omplished using a similar s heme. A write (i.e., store) to the output data register
at memory address 4094 opies the low-order 7 bits in the AC register to the standard output
devi e and at the same time lears (i.e., sets to 0) the high-order bit of the output status register
at memory address 4095. The high-order bit in the output status register at memory address 4095
is later set to 1 by the standard output devi e when it is again ready to a ept another hara ter
in its data register. Standard input and output may be a terminal keyboard and visual display,
or a ard reader and printer, or some other ombination. (Unfortunately, the simulators used to
exe ute level 2 programs on this ma roar hite ture have not as yet implemented the input/output
data and status registers; so input and output are not simulated.)

16

An Example Mi roprogram

Having spe i ed both the mi roar hite ture and the ma roar hite ture in detail, the remaining
issue is the implementation: What does a program running on the former and interpreting the latter
look like, and how does it work? Here we will examine how the hardware omponents are ontrolled
by the mi roprogram and how the mi roprogram interprets the onventional ma hine level. Early
omputers were not mi roprogrammed at all and had instru tions for arithmeti , Boolean oprations,
shifting, omparing, looping, and so on, that were all dire tly exe uted by the hardware. Modern
day redu ed instru tion set omputers (RISC) do likewise, but their level 2 ma hine instru tions are
merely highly en oded mi roinstru tions; so in this ase ompilers translate the high level language
statements into sequen es of mi roinstru tions that are easy to de ode and dire tly ontrol the
mi roar hite ture's data path. Mi roprogrammed ma hines, on the other hand, interpret the level
2 ma hine instru tions using a mi roprogram stored in ontrol memory. The mi roprogram is
written by a mi roprogrammer (an individual who writes mi roprograms and not merely a small
programmer). The ompilers for mi roprogrammed ma hines usually translate high-level languages
into sequen es of level 2 ma hine language statements that are in turn fet hed and de oded by the
mi roprogram that dire tly ontrols the data path's mi roar hite ture.
We ould write the mi roprogram to fet h, de ode and exe ute the level 2 ma hine instru tions
by dire tly spe ifying the sequen es of 32-bit binary numbers (to be stored in ontrol memory)
that ea h dire tly ontrol the hardware for one ma hine y le omprising the four lo k ti ks of
the four-phase y le. This tedious task is what ultimately must be done, but having a higher level
symboli language notation that is then translated into the 32-bit numbers will make the task
easier.
The Mi ro Assembly Language (MAL): One possible notation is to have the mi roprogrammer spe ify one mi roinstru tion per line, naming ea h nonzero eld and its value. For example, to add (the ontents of the) AC to (the ontents of the) A register and store the result in the
AC register, we ould write
ENC = 1, C = 1, B = 1, A = 10
Many mi roprogramming languages look like this; however, this notation is awful.
A mu h better idea is to use a high-level language notation, while retaining the basi on ept of
one sour e line per mi roinstru tion. Con eivably, one ould write mi roprograms in an ordinary
high-level language, but be ause e ien y is ru ial in mi roprograms, we will sti k to assembly
language, whi h we de ne as a symboli language that has a one-to-one mapping onto ma hine
instru tions. Our high-level Mi ro Assembly Language will be alled \MAL," the Fren h word
for \si k." In MAL, stores into the 16 s rat hpad registers or MAR and MBR are denoted by
assignment statements. Thus, the above example in MAL be omes: a :=a + a. (Be ause the
intention is to make MAL Pas al-like, we adopt the usual Pas al onvention of lower- ase names
for identi ers.)
To indi ate the use of the ALU fun tions 0, 1, 2, and 3, we an write, for example,
a :=a + a , a:=band(ir,smask), a :=a, and a:=inv(a),
respe tively, where \band" stands for \Boolean AND" and \inv" stands for \invert" (i.e., bitwise
logi al omplement). Shifts an be denoted by the fun tions \lshift" for left shifts and \rshift" for
right shifts, as in
tir:=lshift(tir + tir)
whi h puts the ontents of the TIR register on both the A and B buses, auses the ALU to perform
an addition, and left shifts the sum 1 bit left before storing it ba k into the TIR register.
17

Un onditional jumps an be handled with


outputs N and Z; for example,

goto

if n then

statements; onditional jumps an test ALU

goto

27

Assignments and jumps an be ombined on the same line. However, a slight problem arises if
we wish to test a register but not make a store. How do we spe ify whi h register is to be tested?
To solve this problem, we introdu e the pseudo variable \alu," whi h an be used in the language to
form a valid assignment statement but whi h in reality has no destination farther than the ALU's
output. (Re all that the ALU is made of only ombinational logi omponents and ontains no
registers or other memory devi es.) For example,
alu:=tir; if n then goto 27
means that the ontent of the TIR register is to be run through the ALU un hanged on the A-bus
(ALU ode = 2) so its high-order bit an be tested. Note that this use of \alu" means that ENC
= 0.
To indi ate memory reads and writes, we will just put \rd" and \wr" in the sour e program.
The order of the various parts of the sour e statement is, in prin iple, arbitrary but to enhan e
readability we will try to arrange them in the order that they are arried out. Fig. 12 gives a few
examples of MAL statements along with the translated elds of the orresponding mi roinstru tions
(shown in de imal shorthand for ea h eld).
A
M
U
Statement
X
mar:=p ; rd
0
rd
0
ir:=mbr
1
p :=p + 1
0
mar:=ir; mbr:=a ; wr
0
alu:=tir; if n then goto 15
0
a :=inv(mbr)
1
tir:=lshift(tir); if n then goto 25
0
alu:=a ; if z then goto 22
0
a :=band(ir, amask); goto 0
0
sp:=sp + (-1); rd
0
tir:=lshift(ir + ir); if n then goto 69 0

C
S
A
O A H M M
E
D
N L F B A R W N
D
D U T R R D R C C B A R
0 2 0 0 1 1 0 0 0 0 0 00
0 2 0 0 0 1 0 0 0 0 0 00
0 2 0 0 0 0 0 1 3 0 0 00
0 0 0 0 0 0 0 1 0 6 0 00
0 2 0 1 1 0 1 0 0 3 1 00
1 2 0 0 0 0 0 0 0 0 4 15
0 3 0 0 0 0 0 1 1 0 0 00
1 2 2 0 0 0 0 1 4 0 4 25
2 2 0 0 0 0 0 0 0 0 1 22
3 1 0 0 0 0 0 1 1 8 3 00
0 0 0 0 0 1 0 1 2 2 7 00
1 0 2 0 0 0 0 1 4 3 3 69

Figure 12: Some MAL statements and their orresponding mi roinstru tions.
The Example Mi roprogram: We have nally rea hed the point where we an put all the
pie es together. Fig. 13 is the mi roprogram that runs on the Mi -1 and interprets the Ma -1. It
is a surprisingly short program { only 81 lines. By now the hoi e of names for the s rat hpad
registers in Fig. 5 is obvious: PC, AC, and SP are used to hold the three Ma -1 registers. IR is the
instru tion register and holds the ma roinstru tion urrently being exe uted. TIR is a temporary
opy of the IR, used for de oding the op ode. The next three registers hold the indi ated onstants.
AMASK is the address mask 0FFF16 , and is used to separate out op ode and address bits. SMASK
is the sta k mask, 00FF16 , and is used in the INSP and DESP instru tions to isolate the 8-bit o set
value. The remaining six registers have no assigned fun tion and an be used as s rat h registers
for whatever the mi roprogrammer wishes.

18

Like all interpreters, the mi roprogram in Fig. 13 has a main loop that fet hes, de odes, and
exe utes instru tiions from the program being interpreted, in this ase level 2 instru tions. Its
main loop begins on line 0, where it begins fet hing the ma roinstru tion whose memory address
is in the PC register. While waiting for this instru tion to arrive, the mi roprogram in rements
the ontent of the PC and ontinues to assert the \Rd" bus signal. When it arrives, in line 2, it is
stored in the IR register and simultaneously the high-order bit (bit 15) is tested. If bit 15 is a 1,
de oding pro eeds to line 28; otherwise, it ontinues on line 3. Assuming for the moment that the
instru tion is a LODD, bit 14 is tested on line 3, and the TIR register is loaded with the original
instru tion shifted left 2 bit positions, one shift using the adder and one using the shifter. Note
that the ALU status bit N is determined by the ALU output in whi h bit 14 is the high-order bit,
be ause IR + IR shifts the IR ontents left 1 bit position. The shifter output does not a e t the
ALU status bit.
All instru tions having 00 in their two high-order bits eventually ome to line 4 to have bit 13
tested, with the instru tions beginning with 000 going to line 5 and those beginning with 001 going
to line 11. Line 5 is an example of a mi roinstru tion with ENC = 0; it just tests the ontent of the
TIR register, but does not hange it. Depending on the out ome of this test, the ode for LODD
or STOD is sele ted.
For LODD, the mi ro ode must rst fet h the word dire tly addressed by loading the low-order
12 bits of the IR into the MAR. In this ase, the high-order 4 bits are all zero, but for STOD and
other instru tions they are not. However, be ause the MAR is only 12 bits wide and onne ted to
only the low-order 12 bits on the B-bus, the op ode bits do not a e t the hoi e of the word to be
read. In line 7, the mi roprogram has nothing to do, so it just waits. When the word arrives, it
is opied into the AC register and the mi roprogram jumps ba k to the top of the loop where the
instru tion fet h y le begins. STOD, ADDD, and SUBD are similar. The only noteworthy point
on erning them is how subtra tion is done.
Re all that in radix r the radix omplement (RC) of a number x is de ned to be RC(x) = rn x.
Similarly, the diminished radix omplement (DRC) of x (also alled the r 1's omplement) is
de ned to be DRC(x) = rn r m x. When m = 0 so that we are dealing only with n-bit registers
ontaining integers, then the 1's omplement of x is 1's(x) = 2n 20 x = 2n 1 x. The 2's
omplement of x is then 2's(x) = 2n x = 10 s(x) + 1, where the 1's omplement of x is the same
as the bitwise logi al omplement of the n-bit number x. Thus, SUBD makes use of the fa t that
x

=x+(

) = x + (y + 1) = x + 1 + y

in two's omplement. The addition of 1 to the ontent of the AC is done on line 16 (using the
ommutativity of additiion); otherwise line 16 would be wasted like line 13.
The mi ro ode for JPOS begins on line 21. If the ontent of the AC < 0, the bran h fails
and JPOS is terminated immediately by jumping ba k to the main loop and fet hing the next
instru tion in sequen e. If, however, the ontent of the AC  0, the low-order 12 bits of the IR are
extra ted by ANDing them with the 0FFF16 mask in the AMASK register and storing the result
in the PC register. It does not ost anything extra to remove the op ode bits here, so we might
as well do it. If it had ost an extra mi roinstru tion, however, we would have had to look very
arefully to see if having garbage in the high-order 4 bits of the PC ould ause trouble later.
In a ertain sense, JZER (line 23) works the opposite of JPOS. With JPOS, if the test ondition
is met, the jump fails and ontrol returns to the main loop. With JZER, if the test ondition is met,
the jump is taken. Be ause the ode for performing the jump is the same for all jump instru tions,
we an save mi ro ode by just going to line 22 whenever feasible. This style of programming
generally would be onsidered un outh in an appli ation program, but in a mi roprogram no holds
are barred. Performan e is everything.

19

Mi roprogram to fet h, de ode, and exe ute Ma -1 instru tions

Adr: Mi roinstru tion

0:

mar:=p ; rd;

Comment

fet h instr

Adr: Mi roinstru tion

Comment

41: alu:=tir; if n then goto 44;

de ode ir12

1: p :=p + 1; rd;

in rement p

42:

2: ir:=mbr; if n then goto 28;

de ode ir15

43: goto 0;

3: tir:=lshift(ir + ir); if n then goto 19;

de ode ir14

44:

4: tir:=lshift(tir); if n then goto 11;

de ode ir13

45: p :=band(ir,amask); goto 0;

5: alu:=tir; if n then goto 9;

de ode ir12

46: tir:=lshift(tir); if n then goto 50;

6:

mar:=ir; rd;

LODD

47:

0000 =

7: rd;

mar:=ir; mbr:=a ; wr;

0001 =

12:

mar:=ir; rd;

de ode ir12
0010 =

ADDD

13: rd;

mar:=ir; rd;

0011 =

de ode ir10

52: alu:=tir; if n then goto 56;

de ode ir9

53:

mar:=a ; rd;

56:

mar:=sp; sp:=sp + 1; rd;

58: mar:=a ; wr; goto 10;


59: alu:=tir; if n then goto 62;

19: tir:=lshift(tir); if n then goto 25;

de ode ir13

60:

20: alu:=tir; if n then goto 23;

de ode ir12

61: mar:=sp; mbr:=a ; wr; goto 10;

alu:=a ; if z then goto 22;

CALL

1111-0000 =

PSHI

1111-0010 =

POPI

55: mar:=sp; wr; goto 10;

SUBD

18: a :=a + a; goto 0;

23:

1110 =

de ode ir11

17: a:=inv(mbr);

alu:=a ; if n then goto 0;

de ode ir12

51: tir:=lshift(tir); if n then goto 59;

57: rd;

22: p :=band(ir,amask); goto 0;

JNZE

50: tir:=lshift(tir); if n then goto 65;

16: a :=a + 1; rd;

21:

1101 =

54: sp:=sp + (-1); rd;

14: a :=mbr + a ; goto 0;

15:

sp:=sp + (-1);

JNEG

49: p :=band(ir,amask); wr; goto 0;

STOD

10: wr; goto 0;


11: alu:=tir; if n then goto 15;

alu:=a ; if z then goto 0;

1100 =

48: mar:=sp; mbr:=p ; wr;

8: a :=mbr; goto 0;

9:

alu:=a ; if n then goto 22;

0100 =

JPOS

62:

sp:=sp + (-1);

mar:=sp; sp:=sp + 1; rd;

perform jump

63: rd;

0101 =

64: a :=mbr; goto 0;

JZER

de ode ir9
1111-0100 =

PUSH

1111-0110 =

POP

24: goto 0;

else don't jump

65: tir:=lshift(tir); if n then goto 73;

de ode ir10

25: alu:=tir; if n then goto 27;

de ode ir12

66: alu:=tir; if n then goto 70;

de ode ir9

26:
27:

p :=band(ir,amask); goto 0;

0110 =

JUMP
= LOCO

67:

a :=band(ir,amask); goto 0;

0111

68: rd;

mar:=sp; sp:=sp + 1; rd;

28: tir:=lshift(ir + ir); if n then goto 40;

de ode ir14

29: tir:=lshift(tir); if n then goto 35;

de ode ir13

70:

30: alu:=tir; if n then goto 33;

de ode ir12
1000 =

LODL

71: a :=sp;
72: sp:=a; goto 0;

1001 =

STOL

31:

a:=ir + sp;

32: mar:=a; rd; goto 7;

33:

a:=ir + sp;

36:

a:=ir + sp;

a:=ir + sp;

73: tir:=lshift(tir); if n then goto 76;

74:

de ode ir12
1010 =

a:=band(ir,smask);

1011 =

76: alu:=tir; if n then goto 80;

ADDL

77:

1111-1010 =

SWAP

de ode ir9
1111-1100 =

INSP

a:=band(ir, smask);

de ode ir8
1111-1110 =

DESP

1111-1111 =

HALT

78: a:=inv(a);

SUBL

79: a:=a + 1; goto 75;

80:

39: mar:=a; rd; goto 16 ;


40: tir:=lshift(tir); if n then goto 46;

a:=a ;

75: sp:=sp + a; goto 0;

37: mar:=a; rd; goto 13;

38:

RETN

69: p :=mbr; goto 0;

34: mar:=a; mbr:=a ; wr; goto 10;


35: alu:=tir; if n then goto 38;

1111-1000 =

halt; goto 80;

de ode ir13

The exe ution y le for ea h de oded MAC-1 instru tion begins at the ontrol store address whose line
is labeled with a omment showing the assembly language mnemoni for the orresponding instru tion
( apitalized for emphasis). \Adr:" is the ontrol store address. The instru tion fet h y le begins at ontrol
store address zero.

Figure 13: Mi roinstru tions to fet h, de ode, and exe ute Ma -1 instru tions on the example Mi -1
mi roar hite ture

20

JUMP and LOCO are straightforward, so the next interesting exe ution routine is for LODL.
First the absolute memory address to be referen ed is omputed by adding the o set ontained in
the instru tion to the ontent of the SP register. Then the memory read is initiated. Be ause the
rest of the ode is the same for LODL and LODD, we might as well use lines 7 and 8 for both
of them. Not only does this save ontrol store spa e with no loss of exe ution speed but it also
means fewer routines to debug. Analogous ode is used for STOL, ADDL, and SUBL. The ode for
JNEG and JNZE is similar to JZER and JPOS, respe tively (not the other way around). CALL
rst de rements the ontent of the SP register, then pushes the return address (whi h is the urrent
ontent of the PC register) onto the sta k, and nally jumps to the alled pro edure. Line 49 is
almost identi al to line 22; if it had been exa tly the same, we ould have eliminated line 49 by
putting an un onditional jump to 22 in 48. Unfortunately, we must ontinue to assert \Wr" for
another mi roinstru tion.
The rest of the ma roinstru tions all have 1111 as their high-order 4 bits, so de oding of (at
least some of) the low-order 12 bits in these instru tions is required to tell them apart. The a tual
exe ution routines are straightforward so we will not omment on them further.
A few more points are worth making. In Fig. 13 we in rement the ontent of the PC register
in line 1. It ould equally well have been done in line 0, thus freeing line 1 for something else while
waiting for memory to respond. In this ma hine there is nothing else to do, but in a real ma hine
the mi roprogram might use this opportunity to he k for I/O devi es awaiting servi e, refresh
dynami RAM, or something else.
If we leave line 1 the way it is , however, we ould speed up the ma hine by modifying line 8 to
read
mar:= p ; a := mbr; rd; goto 1;
In other words, we an start fet hing the next instru tion before we have really nished with the
urrent one. This apability provides a primitive form of instru tion pipelining. The same tri k
an be applied to other exe ution routines as well.
It is lear that a substantial amount of the exe ution time of ea h ma roinstru tion is devoted
to de oding it bit by bit. This observation suggests that it might be useful to be able to load
the MPC register under mi roprogram ontrol. On many existing omputers the mi roar hite ture
has hardware support for extra ting ma roinstru tion op odes and stung them dire tly into the
MPC to e e t a multiway bran h. If, for example, we ould shift the IR 9 bits to the right and
put the resulting number into the MPC, we would have a 128-way bran h to lo ations 0 through
127. Ea h of these words would ontain the rst mi roinstru tion in the exe ution sequen e for
the orresponding ma roinstru tion. Although this approa h wastes ontrol store spa e, it greatly
speeds up the ma hine, so something like it is nearly always used in pra ti e.
By using memory-mapped I/O, the CPU is not aware of the di eren e between true memory
addresses and I/O devi e registers. The mi roprogram handles reads and writes to the top four
words of the address spa e the same way it handles any other reads and writes.
Designing a ma hine as a series of levels is done for e ien y and simpli ity be ause ea h level
deals only with another level of abstra tion. The level 0 designer worries about how to squeeze the
last few nanose onds out of the ALU by using some means to redu e arry-propagation time. The
mi roprogrammer worries about how to get the most mileage out of ea h mi roinstru tion, typi ally
by exploiting as mu h of the hardware's inherent parallelism as possible. The ma roinstru tion set
designer worries about how to provide an interfa e that both the ompiler writer and mi roprogrammer an learn to love, and be e ient at the same time. Clearly, ea h level has di erent goals,
problems, te hniques, and in general, a di erent way of looking at the ma hine. By splitting the
total ma hine design problem into several subproblems, we an attempt to master the inherent
omplexity in designing a modern omputer.
21