Machines, Machine Languages, and Digital Logic 37
2.2.4 BRANCH INSTRUCTIONS
In Chapter 1 we mentioned the program counter (PC) as pointing to the next instruction
to be executed. Thus, the PC controls program flow. During normal program execution,
the PC is incremented during the instruction fetch to point to the next instruction. A
transfer of control, or a branch, to an instruction other than the next one in sequence
requires computation of a target address, which is the address to which control is to branch target
be transferred, A target address is specified in a branch or jump instruction. The target ldress
address is loaded into the PC, replacing the address stored there.
‘There may be special branch target registers within the CPU. Ifa branch target register
is preloaded with the target address prior to the execution of the branch instruction, the
CPU will prefetch the instruction located at the branch target address, so that instruction
will be ready for execution when the branch instruction finishes execution, Branch target
registers are examples of registers that have “personality,” as discussed at the beginning of
Section 2.1.
A branch may be unconditional, as is the C goto statement, or it may be conditional,
which means it depends on whether some condition within the processor state is true or false.
‘The conditional branch is used to implement the condition part of C statements such as
if (X<0) X = -X;
‘There is no machine instruction that corresponds directly to the conditional statement above.
‘The approach most machines take is to set various status flags within the CPU as a result of
ALU operations. The instruction set contains a number of conditional branch instructions
that test various of these flags and then branch or not according to their settings. The bit or
bits that describe the condition are stored in a register variously called the processor status
‘word (PSW), the condition code (CC) register, of the status register. Some of the status bits condition code
record the results of arithmetic operations. The VAX11 code to implement the preceding (CC)
statement is
CMP x, 0
BGE OVER
MNEG X, X
OVER:
‘The most'common condition-code bits are zero (Z), overflow (V), carry (C), and negative (N),
which are set to indicate that the last arithmetic operation resulted in a zero result, an arithmetic
overflow, a carry-out of the most significant bit (msb), or a negative result, respectively.
An alternative approach is to compute the difference of two values and store the result
in a register. A conditional branch instruction is then used to test the computed value to see
whether the condition is met. This is the method used in the SRC machine to be described
in Section 2.3 and 2.4. We discuss conditional branching and the merits of these and other
approaches in Chapter 6, “Computer Arithmetic and the Arithmetic Unit.”
Table 2.3 shows several examples of branch instructions. Notice that the SOB (subtract
one and branch) and JCXZ instructions are specifically designed to control loops; they test
the value of a register and branch if the result is zero. The SOB instruction even handles
decrementing the register prior to the test. These instructions are used to implement higher-
level language constructs such as for, whi Te, and repeat.38 Computer Systems Design and Architecture
Table 2.3 Example Branch Instructions
ane
BLBS A, Tot Branch to address Tgt if the least significant bit at location Ais set. | VAXIL
bun r2 Branch to location in r2 if the previous floating point comparison | PPC G4
signaled that one or more of the values was not a number.
beq $2,$1,32 | Branch to location PC+4+32 if contents of $1 and $2 are equal. MIPS R3000
SOB R4 Loop —_| Decrement R4 and branch to address Loop if result x 0. DEC PDPII
JOXZ Addr’ [ jump to Addr if contents of register Cx = 0. Intel 8086
2.2.5 4-, 3-,2-, 1-, ano O-Aporess AnD GENERAL
Recister Machine Classes
‘Machine instructions must be encoded into a bit pattern that somehow specifies all of the
four items discussed in the start of Section 2.2, Two examples of such encodings were
shown in Table 1.2 for the MC68000 processor. The instruction set designer would like to
minimize the number of bits devoted to the specification, while at the same time allowing
maximum flexibility in how these items can be specified. All things being equal, the
designer would like the entire encoding for the instruction to fit into one machine word,
and in fact this has become a hallmark of the RISC approach. For a two-operand arithmetic
instruction, five items need to be specified:
1. Operation to be performed
2, Location of first operand
3. Location of second operand
4, Place to store the result
5. Location of next instruction to be executed
In this section we consider a number of abstract or hypothetical machines that vary in how
many of these five items are explicitly specified, from five down to one. In each hypothetical
machine, we study the encoding of an ALU instruction.
‘As we consider each machine, we quantify the number of bits required to encode
‘one of its instructions. To make the encodings explicit, we assume that the hypothetical
machine has 24-bit memory addresses and has 128 instructions. This means 3 bytes will
be required to encode each address, and 7 bits to specify one of the 128 instructions. The
7 bits will be rounded up to 1 byte.
‘The 4-Address Machines and Operations. ‘The 4-address machine specifies all
of the last four items in the list above, so it will require an address for each operand,
one for the result, and one for the next instruction, Figure 2.3 shows schematically the
programmer's model of the machine operation and instruction format. Notice that there
are no programmer-accessible registers in the CPU, as none are needed. The operands
are both fetched from memory, and the result is returned to memory. The address of the
next instruction is also specified as an explicit value.Machines, Machine Languages, and Digital Logic
Memon CPU add, Res, Opt, Op2, Nexti (Res — Opt + Op2)
Optaddr:
Op2addr:
‘Opt
(Op2
ResAddr:| Res
Nextiddr:
Instruction format
Bits: 8 24 24 24 24
add | Restdar [ Opracar | Op2adar | NextiAddr
Which — Where to Where to find
‘operation putresuit Where tofind operands next instruction
Fig. 2.3 The 4-Address Machine and Instruction Format
23, 6 0
Wasted ‘Op
‘Operand 1 address
Operand 2 address
Result address
‘Address of next instruction
Fig. 2.3a Layout of a 4-Address Instruction in Memory
Each address requires 3 bytes, so it will require 4 x 3 + 1 = 13 bytes to encode a
4-address ALU instruction. If the normal path between memory and the CPU is 24-bit
word “chunks,” then the instruction will occupy five words in memory, and five words will
need to be transferred to the CPU just to specify the instruction. A layout of the instruction
in memory might appear as shown in Figure 2.3a.
Let us now count the number of memory accesses required when the instruction
executes. Five words will be transferred to the CPU when the instruction itself is fetched,
as shown in the preceding figure. Then the two words representing the operands themselves
need to be fetched into the CPU, and after the addition has been performed, the result needs
to be written back to memory. This means that 5 + 2+ 1 = 8 words must be transferred to
add the two words and store the result.
Because of the large instruction word size and number of memory accesses, the
4-address machine and instruction format is not normally seen in machine design,
although the 4-address structure is used intemally in some implementations of computer
control units. This kind of controller implementation is known as microcoded control. In
a microcoded design, the steps required to execute an instruction are themselves stored as
fa sequence of microcode instructions that are executed to effect instruction execution, We
discuss the design of microcoded machines in Chapter 4. Time and space costs make the
4-address format uncompetitive for other uses. In fact, much of the effort designers put into
memory
accesses per
instruction
3940 Computer Systems Design and Architecture
Memory. cpu add, Res, Op1, Op2 (Res « Op2 + Opt)
OptAdde|_Opt_|-—+>
OpeAddr:| “Ope
ResAddr| Rese
1
[Program
NextiAddr:| Next fe— counter
Where to find
‘next instruction
- Instruction format
Bits:_8 24 24 24
add | ResAddr | OptAddr | Op2Addr
Which — Where to
operation put result Where to find operands
Fig. 2.4 The 3-Address Machine and Instruction Format
instruction set organization goes toward reducing the number of instruction bits needed
to specify the preceding five items. In some hypothetical implementation domain where
memory costs and access times were small compared to ALU execution time, however, the
4-address machine could be considered as a design alternative.
‘The 3-Address Machines and Operations. The inclusion within the CPU of a program
rogram counter counter that always points to the next instruction eli
inates the need to specify the address
of the next instruction in all but the class of branch instructions. It is now the responsibility
of the control unit to know the size of the currently executing,
truction, so the control unit
can advance the PC to point to the next address beyond. Machines with a program counter
but no operand storage in the CPU are known as 3-address machines. Figure 2.4 shows
the operation of a 3-address machine and the corresponding instruction format. Note the
inclusion of a program counter that points to the next instruction.
Figure 2.4 also shows that instru
's and data may be stored in different parts of
memory. The instruction format now includes only four fields: one that specifies which
ope
specifies the result address.
is to be performed, two that specify operand memory addresses, and one that
‘There is neither the need nor the ability to specify the address of the next instruction.
In our example above, the number of bytes would be reduced from 13 to 3 x3 +1= 10
bytes, or four 24-bit words in a machine that accessed 24-bit words instead of bytes.
The 2-Address Machines and Operations. A reduction to two addresses can be
obtained by storing the result into the memory address of one of the operands. Figure 2.5
shows the machine structure and instruction format for the 2-address machine. The change
from a 3-address machine to a 2-address machine does not require any change in the
register structure of the CPU, but only in the instruction meaning and format. Now our
example instruction is reduced from 10 bytes to 2 x 3 + 1 = 7 bytes, or three 24-bit words
in a word-oriented machine.Machines, Machine Languages, and Digital Logic 41
Memory cru add Op2, Opt (Op2—- Op2 + Opt)
OprAdar:|“Opt_|-—+ '
Op2Addr:|Op2 Res}
Program
NostiAddr: ‘counter
Where to find
next instruction
Instruction format
Bits:_8 24 24
add | Op2acar | Opiaaar
Wien 7 Where fo find operands
operation,
Where to
put result
Fig. 2.5 The 2-Address, Machine and Instruction Format
The 1-Address (Accumulator) Machines and Instructions. Let us now add a single
accumulator, Acc, to the CPU and use it both for the source of one operand and as the
result destination. Figure 2.6 shows the programmer's model and instruction format for
such a machine, Now the result is kept in the accumulator in the CPU and may be used for
further computations. The Acc serves both as the source of one operand and as the storage
location for the result. Notice that because there is only a single accumulator, it need not be
mentioned in the machine instruction.
add Opt (Acc + Acc + Opt)
prada:
Where to find
operand2. and
A where 19 put result
Nextiaddr:
Instruction format
Bis:_8 24
‘add | OptAdar
Which Where to find
‘operation operand
Fig. 2.6 The 1-Address Machine and Instruction Format42 Computer Systems Design and Architecture
load and store
stack, push and
The 1-address instruction requires additional operations to load and store the
accumulator's contents, however. These instructions are also 1-address instructions. The
instruction Ida Addr loads the accumulator from address Addr, and sta Addr stores
the accumulator’s contents into address Addr:
Acc = OpAddr; > Ida OpAddr
OpAddr = Acc; > sta OpAddr
Our example addition now requires only 1 x 3 + 1 =4 bytes, or two 24-bit words for the
addition, although if one operand were not in memory, or if the final result needed to be
written back to memory, additional load or store instructions would be required.
‘The 1-address machines generally provide a minimum in the size of both program and
CPU memory required, and the architecture was quite popular in very early mainframes
and early microcomputers. The Intel 8080, Motorola 6800, and MOS Technology 6502
were examples of machines that contained accumulators.
‘The 0-Address (Stack) Computers and Address Formats. The inclusion of a push-
down stack in the CPU allows ALU instructions with no addresses. Operands are pushed
onto the stack from memory, and ALU operations implicitly operate on the top members
of the stack, Figure 2.7 shows the add operation performed on the two operands in the top
and second positions on the stack. The operation removes both operands from the stack and
replaces them with the result. The push operation from memory to stack is also shown. The
code to add two memory operands is a bit more complex:
Op3 = Opl Op2; push Op1
pee tees Bush Ope
add
pop Op3
Notice that the push and pop operations still require a memory address, and the
word count for the code above is 3 + 1 = 4 bytes for each push and pop, and an additional
Instruction formats
Memory ee push Opt (TOS — Opt)
OptAddr:| Opt Bits; 6 24
t Format [push | Optacdr_]
Operation Result
add (TOS « TOS + SOS)
Sak Bins: 8
"Format add
NextiAddr:| Ned }e;—{~ Program
’ counter_|°* ‘ Wich operation
| Where to fina \ where
' ‘ to find operands,
vo — Seminervovon 1 and where to put result
{on the stack)
Fig. 2.7 The 0-Address, or Stack, Machine and Instruction FormatsMachines, Machine Languages, and Digital Logic 43
byte for expression evaluation, for a total of 4 x 3+ 1 = 13 bytes. The drawback of a 0-
address computer is that operands must always be in the top two stack locations, and extra
instructions may be required to get them there. Stack machines, like stack calculators,
have their adherents. General register machines have achieved more popularity in recent
times, however, probably because they are more amenable to machine hardware speedup
jues, such as pipelining, that run instructions in parallel. We cover pipeline
techniques in Chapter 5.
Example 2.1 Expression Evaluation with 3-, 2-, 1-, and 0-Address
Machines Evaluate the expressiona = (b + c)* (d - e) in3-, and 0-
address machines. For these machines, minimal code to evaluate this expression is shown
in the following table:
3-Address 2-Address Accumulator Stack
add a,b,c Toad a,b Ida b push b
mpy a,a,d add a,c add c push c
sub a,a,e mpy a.d mpy d add
sub a,e sub e push d
staa mpy
push e
sub
Pop a a
General Register Machines and 1-1/2 Address Instructions. Accumulator and
stack machines offer the advantage that an intermediate result can be retained in
the CPU. Complex operations make it worthwhile to store more than one temporary
result in the CPU. Supplying more than one register to the CPU requires the use of
instruction bits to encode which register is to be used, but specifying one of n registers register
requires only log,(n) bits, and temporary registers can make instruction execution much Selection
more efficient. A common type of ALU instruction in the general register machine is
the analog of the 3-address instruction, but it uses CPU registers instead of memory
addresses. These “small” addresses that specify registers instead of memory addresses
are sometimes referred to as half addresses. An instruction that specifies one operand in half addresses
memory and one operand in a register would be known as a 1-1/2 address instruction,
Figure 2.8 shows the structure and formats of a general register machine. The figure
shows both a load from memory to register R8 and an add of registers R6 and R4 with
the result stored in R2.
‘Our encoding example is now a bit more complex. Assume that there are 32 general-
Purpose registers. Each register reference thus requires 5 bits to specify 1 of the 32 registers,
and our 3-register add instruction now requires 5 x 3 +7 = 22 bits, which means we can
encode the instruction in one 24-bit word. The load instruction will require 7 + 5 +24 bits,
or two 24-bit words.
General register machines provide the greatest flexibility to the programmer, and
virtually every new architecture since 1980 has been of this class. You may wonder at44 Computer Systems Design and Architecture
Instruction formats.
load load RA, Opt (RB < Opt)
addr _Opt_}-+-284 >| Re
ba = [leag [Rs | Optaaar
Re
add R2, 4, RG (R2.— Ra + Re)
add | re | a | 6
Nextt Program
Fig. 2.8 General Register Machine and Instruction Formats
‘machines
register-memory
machines
memory-
the seeming progression from accumulator machines to general register machines. This
is due in large part to the reduction in the cost of machine memory. When RAM is $15
per megabyte, having many general purpose registers is a fine idea. It was not so fine
in the heyday of the accumulator, during the early days of computing, when a single bit
cost $25.
Classifying Machines by Operand and Result Location. Bear in mind that the
preceding classes are hypothetical. First of all, there are no 4-address machines. Second,
nearly all real machines provide some combination of the above instruction classes. The
VAXIL is probably the champion in this category, as it includes instructions from all
classes, Real machines are usually classed as being in the load-store, register-memory, or
memory-memory classes.
Many modern computers, including RISCs, are of the load/store, sometimes called
register-to-register, variety. These are 1-1/2-address machines in which memory access
instructions are limited to two instructions: load and store. The load instruction moves
data from memory to a processor register, and the store instruction moves data from the
processor to memory. ALU and branch operations in load-store machines can accept
only operands located in processor registers, and they must store the result in a processor
register. Load/store machines are sometimes called register-to-register machines because
ALU operations must have operands and results in registers. The philosophy is that moving
data values back and forth between memory and the processor is an expensive operation,
and that the instruction set design should discourage this operation by limiting its usage to
just a few explicit load and store instructions.
Register-memory machines locate operands and result in a combination of memory
and registers. They are classed as 1- or 1-1/2-address machines, in which one operand or
the result must be an accumulator or general register. Memory-to-memory machines allow
both the operands and the result to reside in memory. They are classed as either 2- or 3-
address machines, depending on whether one of the operand locations also serves as a
result location,Machines, Machine Languages, and Digital Logic 45
Key Concepts: Trade-Offs in Instruction Set and
Processor Registers
‘The range of choices of processor-state structure and instruction types trade off flexibility
in the placement of operands and results against the amount of information that must be
specified by an instruction.
The 3-address machines have the shortest code sequences but require an
unreasonably large number of bits per instruction.
The O-address machines have the longest code sequences and the shortest
individual instructions.
Even in O-address machines there are I-address instructions, push and pop.
General register machines modify the addressing rules by specifying one of a
small set of registers, such as 32, by a short address, such as 5 bits. A general
register counterpart to a 3-address instruction could specify 2 registers and |
memory address.
H _Load-store machines only include full memory addresses in instructions that
move data between memory and registers: load and store.
Current technology makes register access much faster than memory access
and places a premium on short instructions. Both favor the general register
organization.
2.2.6 Access PatHs To Oreranos: Appressinc Moves
‘The computation process involves the continuous shuffling of program and data into and
‘out of the CPU. The many different kinds of data structures and program variable references
possible in modem high-level languages have driven machine architects to develop many
sophisticated ways of providing access paths to operands in memory and CPU registers:
addressing modes. Below we provide an informal description of some of the more common
ones. After we discuss the SRC computer and the RTN description language, we use RTN to
formally describe these and other less common addressing modes. Be aware that the terms
used to describe addressing modes are not standardized in any way, and different machine
and assembly language designers have their own terminology for addressing modes.
To access an operand in memory, the CPU must first generate an address, which it
then issues to the memory subsystem. That address is referred to as an effective address.
‘The address may be computed in various ways, and the results of that computation may
be interpreted in various ways. The process may involve computing an address in memory
where the operand address is stored. Some of the more common addressing modes are
shown in Figure 2.9.
The immediate addressing mode, shown in Figure 2.9a, is used to access constants
stored in the instruction. It supplies an operand without computing an address, so it is
not used for result addressing. Immediate addressing provides one of the two means of
introducing constants into a program. (The other way is by storing the constants as data
items in memory and retrieving them by direct addressing.)
‘The previous code examples haveused the direct addressing mode, shown in Figure 2.9b.
‘The address of the operand is specified as a constant contained in the instruction.
addressing
effective address
immediate
addressing
direct addressing46 Computer Systems Design and Architecture
(@) immediate addressing:
Instruction contains.
the operand
(b) Direct addressing:
instruction contains
address of operand
Adres of adress of A —
(0) indirect aderoasing:
instruction contains ‘ner [Op TS
dcr of address
of operand foad (A), ...
>|Operand ada
(6) Register direct addressing: oad Rt verter
register contains operand ° ni [—Operand
(0) Register indirect addressing:
register contains address ne (Uber oor
of operand T—+|_Operang
load [R2)..
(f Displacement (based or
indexed) addressing:
address of operand =
register + constant
(g) Relative addressing: ‘str [Op +
address of operand =
PC + constant \
(h) Implied addressing: Instr [Op
instruction implies cme
the address of operand
Fig. 2.9 Common Addressing Modes
indirect In indirect addressing, shown in Figure 2.9c, a constant in the instruction specifies not
addressing mode the address of the value, but the address of the address of the value. An example of the use
of indirect addressing is in implementing pointers, where the pointer, which is an address,
is stored in memory at the pointer address. Thus, two memory references are required toMachines, Machine Languages, and Digital Logic 47
access the value. The CPU must first fetch the pointer, which is stored in memory; then,
having that address, the CPU accesses the value stored at that address.
In the register direct mode, shown in Figure 2.94, the operand is contained in the
specified register.
‘When the address of the operand is in a register, the mode is referred to as the register
indirect mode, shown in Figure 2.9e. This addressing mode is used to sequentially access
the elements of an array stored in memory. The starting address of the array is stored in a
register, an access made, and the register incremented to point to the next element.
To access arrays, or components of the C struct, or the Pascal record (which by
definition are stored at a fixed offset from the start address of the structure), the indexed
‘mode, sometimes called displacement or based addressing. is used. as shown in Figure 2.9f.
‘The memory address is formed by adding a fixed constant, usually contained within the
instruction, to the address value contained in a register. The term indexed is normally used
when the constant value is the base of an array in memory, added to the “index” stored in a
register. The term displacement is used when the base of a Struct is held in a register and
added to the constant offset, or displacement, of the field in the struct.
The relative addressing mode, shown in Figure 2.9, is similar to indexed, but the base
address is held in the PC rather than in another register. This allows the storage of memory
operands at a fixed offset from the current instruction.
In implied addressing mode, the address of the operand(s) is implied by the instruction
and need not be mentioned explicitly. For example, the instruction CMC shown in
Figure 2.9h stands for “complement the carry flag”. The operand being the carry flag is
implied in this instruction. Similarly, consider a hypothetical instruction MUL, which
always multiplies the values present in (wo fixed registers say °X’ and *Y’ and stores the
product in the register ‘P’. This instruction uses implied addressing mode. The implied
address mode has the disadvantage that itis least flexible, but has the advantage that the
instruction can be encoded in a compact manner.
The previous discussion is only to provide a flavor for the complexity of addressing
modes, A formal description of addressing modes is given in Section 2.5,
2.3 Informal Description of the Simple RISC Computer, SRC
In this section we provide an informal description of SRC. and in the next section we
provide a formal description. This example machine is sufficiently simple and lacking in
the complications necessary in real machines that in Chapters 4 and 5 it will serve as an
example of detailed machine hardware design
2.3.1 RecisteR AND Memory Structure
Figure 2.10 shows the programmer's model of the SRC machine. Itisa general register machine,
with 32 general purpose, 32-bit registers, plus a program counter (PC) and an instruction
register (IR). Although the main memory is organized as an array of bytes, only 32-bit words
can be fetched from or stored into main memory. Its memory operand access follows the load-
store model described previously. A word at address A is defined as the 4 bytes at that address
and the succeeding three addresses. The byte at the lowest address contains the most significant
8 bits, the byte at the next address contains the next most significant 8 bits, and so on.
register direct
mode
register indirect
mode
indexed, or
displacement, o
based mode
addressing mode48 Computer Systems Design and Architecture
‘TheSRCCPU Main memory
at ° 70
RO} 32.32-bit | 0
[general —| =
[- purpose —] bytes
egisters of
rif main].
Po memory
iR ee
Instruction formats
feidstle, Bt vy 2
‘addi, andi, ori (oote te |}
weer °
2d stl (OP yet)
28222117 9
8. neg, not ruse
nites
zaman wn
abr (oot fete arms Gad
as gparsre is
see fopfa te Pre | cay aed ond
as gzarirse wis
Sasson Pootate [e | vasa —}
S acd ew, Foote Tm Le]
‘ay_z126 220197
™ [oot [> 1) unwed coma]
sha
shi.she ay crap 2221716 12
v» (Optra [0 re [eS rms
peed
|
2.3.2 Instruction Formats
[7] means contents
of register 7
‘M[32] means contents
‘of memory location 32
Example
1013.4 (RI3} = MIA})
1813, 4(r5) (R(3] = MERIS) + 41)
addi 2,4,1 (Rl) = Rid] +1)
dr 15,8 (AIS) = MIPC + 8)
lart6,45 (RI) = PC + 45)
regi. (RI7]=-AIB)
bear 4, 0
(branch to A] if RO} == 0)
bring 6, 14, 10
{(R{6) = PC; branch to R(t R(O] +0)
‘add 10,2, 4 (R(O} = RZ] + RA)
she 10,1144
(PiO} = Fs} shited right by 4 bits
shi 2,14, 16
(P{2} = F[s]shitted leh by count in RI6))
stop
Fig. 2.10 Programmer's Model of the SRC
Figure 2.10 shows 23 instructions in 8 different formats:
@ Load and store instructions: There are four load instructions—1d, 1dr, 1a,
and 1ar—and two store instructions—st and str.
@ Branch instructions: There are two branch instructions, br and br, that allow
unconditional and conditional branches to an address contained in a specified