You are on page 1of 24

CHAPTER4

Assemblers

4.l ELEMENTS OF ASSEMBLY LANGUAGE PROGRAMMING


An usembly language is a machine dependent, low level programming language
which is specific to a certain computer system (or a family of computer sy5'ems).
Compared to the machine language or a computer sysaem. ir provides lhree basic
fearura which simplify programming:
l. Mn~monic ~ration ct><ks: Use of mnemonic operation codes (also called
mMmonic opcoda) for machine instructioos eliminates the need lO memo-
rize numeric opcralion codes. It also enables the usembler to provide helpful
diagnostics. for example indication of misspell operation codes.
2. Symbolic OfNratllU: Symbolic names can be awociated with data or il15b'UC-
tions. These symbolic names can be used as operands in usembly statements.
The assembler performs memory bindings lO these names; the progr.muner
need noc know any derails of the memory bindings performed by the assem-
bler. This leads 10 a very important practical advantage during program modi-
fication as discussed in Section 4. 1.2.
3. Dala dttlarations: Daaa can be declared in a varicry of notations. 'including the
decimal notation. This avoids manual conversion of constants into their inter-
nal machine representation. for example. conversion of -5 into ( 11111010)2
or 10.j into (41 A80000)16 •
Statnmnl format
An iWffllbly language statement has the following format
[Label) <0pc<><k> <operand spec> [.<operand spec> .. ]
where the notation( .. ) indicares that the enclosed specification is optional. If a label
is specified in a swcmcnt. it is a.uocialcd u a symbolic name with lbc memory
word(s) generalcd for lhc swemenl. <operands,,«> has the following synaax:
AMemblers 87

<symbolic raame> [+<displac~ment> ][( <inda ,rgisttr> ))


.
Thus, some possible operand forms are: ARB.A, AREA+6, lR£l ( 4), and lREA+6 ( 4) .
1be firs1 specification refers to the memory word with which the name AREA is asso-
cialcd. The second specification refers to the memory word 5 words away from the
word with the name il!l. Here •5• is the displacement or offset from AREA. The third
specification implies indexing with index register 4--that is, the openmd address is
oblained by adding the contents of index register 4 to the address of AREA. The last
specification is a combination of the previous two specifications.

A 1imp1e • 1nhly ........

In the finl half of the chapcer we use a simple assembly language to illustrate features
of assembly languages and techniques med in assemblers. ln this language. each
staaement ha.1 two operands. the first operand is always a register which can be any
one of ~ . BREG, CREG and DREG. The second operand refers to a memory word
using a symbolic name and an optional displacemenl (Note that indexing is not
permiued.)

/nstn,Cfion .-4.sMmbly Rnnarts


opcode lffMfflOtlic

00 STOP Stop execution

=}
01
02 Fint o~rond is modi/i,J
03 NULT
Condition cod~ is ut
04 tlOVER Register +- memory move
OS NOVEM Memory +- rcgistcl' move
06 COMP Seu condition code
07 BC Branch on condition
08 DIV Analogous to SUB
09
10 ~} First o~rand is not used

fl&. 4.1 Mnemonic operarion codes

Figure 4.1 lists the mnemonic opcodes for machine inslructions. The NOYE in-
structions move a value between a memory word and a ~st.er. In the MOVER in-
struction the second operand is the source operand and the first operand is the target
operand. Converse is true for the MOVEM instruction. AU arithmetic is performed
in a register (i.e. the result replaces the contents of a register) and sets a condition
cod~. A comparison instruction sets a condition code analogous to a subtract instruc-
tion without affecting the value.1 of its operands. The condition code can be tested by
a Branch on Condition (BC) insuuction. The assembly sta1ement corresponding to it
has the fonnat
BC <condition ccxk spec>, <memory addrrss>
81 Systems Programming & Operating Systems

It transfers control to the memory word with the address <memory addrrss> if the
current value of condition code matches <condition cO<k spec>. For simpliciry. we
amume <condition code s~c> to be a charac1er sb'ing with obvious meaning. e.g.
GT, EQ. etc. A BC stalemcnt with the ,condition code spec ANY implies unconditional
transfer of conlr'OI. In a machine language program, we show all addresses and con-
stants in decimal rather than in octal or hexadecimal.
Figure 4.2 shows the machine insuuctioos format. The opcode. register operand
and memory operand occupy 2. 1 and 3 digits, respectively. The sign is not a part
of the instruction. The condition code specified in a BC statement is encoded into
the first operand position using the codes 1-6 for the specifications LT, LE. EQ. GT,
GE and AIY. respectively. Figure 4.3 shows an assembly language program and an
equivalent machine language program.

sign opcod~ ,q ,rwmory


operand operand

fla. 4.l lnsauc:tioa fannaa

STAJ\T 101
WO N 101) + 09 0 113
NOYER BREC, ONE 102) + 04 2 116
NOVEM BREC, TERM 103) + 06 2 116
AGAIN NULT BREC, TERN 104) + 03 2 116
NOYER CREG, TERN 106) ♦ 04 3 116
ADD CREG, ONE toe> + 01 3 116
NOVEi CREG, TERN 107) + 06 3 116
COMP CREG, N 108) + 06 3 113
BC LE, AGAIN 109) + 07 2 104
NOVEM BREG, RESULT 110) + 06 2 114
PRINT RESULT 111) + 10 0 114
STOP 112) + 00 0 000
N DS 1 113)
RF.SULT DS 1 114)
ONE DC ' 1, 116) + 00 0 001
TEM DS 1 116)
END

.... 4.3 An auembly and equivalent machine lan&llaJC propam

4.1.1 Allanbly ......... Stalmlmts


An assembly program contains lh.rce kinds of SlatemenlS:
1. Imperative swemenlS
Assemblen 19

2. Declaration araneata
3. Assembler directiva.

lmpn adYe at1 ■•1111


An imperative 1t1temrtd ii-dialla an action to be pafol med durina the execution
of the wembled p1<,pwm. Each imperalive datelnn typically tnnslates into one
machine instruclion.
Deduadoa ........
The 1yn1U of decl■nllioa 1t11emen11 ii u follows:

[Label] DS <cou&..&>
[Label] DC
'<••'••>'
The DS (short for tl«kn ~ ) atatanent raa .a areas of memory ■nd as10-
ciata names with them. Consider the followina D8 1t11ements:

DS 1
'
0 DS 200

The fint atata•lcnl racna ■ IDCDQ')' area of l wud ■nd ulOCiata the name , with
it. The leCCOd ltltemem raena a block of 200 memory words. The name a is
associated with the flnt word of the block. Odler' words in the block can bt accessed
through offsets from 0, e.g. 0+6 ii the IWb wont of the memory block. etc.
The DC (short f o r ~ OOfUlt.DII) st■•aneul amttum memcxy words contaio-
ina constants. The statemall
OD DC '1'
UIOciaaea the name on with a memory word cmt1inin1 the value •1•• The program-
mer can declare comc■ati in different fonu--decima1. binary, hexwledmal, etc. The
U1Cmbler couwata lbem to the ■pproprilllle bm:nal form.

V• fJI : • rtn:nt■
Contrary to the name •clecllff CODlhnf. the DC lblla•w,I doea DO( rally implemau
comwm, it memy initi1Ji1111 memory wmts to given values. These values are not
prob'JCted by the asembler. they may be cblDged by moving a new value into the
memory word. For exunple, in fia. 4.3 the value of OD can ~ changm by executing
an instnx:tion NOYEN BRIG, OIi.
An aucmbly projiaat am me COllltlm in the teme implemeoled in an HLL in
two ways--e invnerfive opennda. and • lillnls. Immediate operands can be used
in an usembly 1t1reuia11 ODly If lbe ■n:bicmure of the target machine includes the
necessary fearures. In such• macbine, die asembly at11a11e111c
90 Systems Programming & OpcratinJ Systems

ADD .lREG,5
is translated into an instruction with two openulm-.lREG and the value ·s·
as an
immediate operand. Note that our simple assembly language does not suppon this
feature. whereas the usembly language of Intel 8086 supports it (see Section 4.5).
ADD .IAEG, FIVE
ADD lREG, •'5' ~
FIVE DC '5'
(a) (b)

t1&- 4.4 Use or liten.ls in an auembly propam

A liJ~rol is an operand with the syntax =•<va~>'. It differs from a COIISlallt


because its localioo cannot be specified in the membly program. This helps to
ens~ that its value is not changed during execution of a program. It differs from an
immediate operand because no architectural provision is Deeded to suppon its use.
An Msembler handles a literal by mapping its use into other features of the assembly
language. Fi~ 4.4(a) shows use of a literal ='S' . Figure 4.4(b) shows an equivalent
arrangement using a DC staaemcnt YIVE DC '5 ' . Wheo the assembler encounters
the use of a literal in the operand field of a swement. it handles the literal using
an arrangcmcnt similar to that shown in Fig. 4.4(b)-it allocates a memory word to
contain the value of the literal, and rq,laces the use of the literal in a staaement by
an operand expression referring to this word. The value of the li.tm.l is protected by
the fact that the name and address of this wmt is not know to the msembly language
programmer.

A• alNitr clLecdwa
Assembler directives instruct the assembler to perform certain actions during the
assembly of a program. Some assembler directives arc described in the foUowing.
START <constanJ>
This dittctive indicates that the first word of the larget program generated by the
assembler should be placed in the memory word with address <constant>.
DD [ <operand spec>]
This directive indicates the end of the soun:e program. The optional <operand
spec> indicates the address of the inslJUction where the execution of the program
should begin. (By default, ~tion begins with the first instruction of the assembled
prog1am.)

4.1.2 Advaat-,a of AMmbly lupaae


The primary advantages of mcmbly language programming vis-a-vis machine lan-
guage prognu:nm.ing arise from tbc use of symbolic operand specifications. Consider
Assemblers 91

the machine and assembly language statements of Fig. 4.3 once again. The programs ,
presently compute N!. Figure4.5 shows a changed program to compute! x N!, where
rectangular boxes arc used to highlight changes in the program. One statement has
been inserted before the PRIIT statement to implement division by 2. ln the ma-
chine language program. this leads to changes in addresses of constants and reserved
memory areas. Because of this, addresses used in most instructions of the program
bad to change. Such changes are not ~ded in the assembly program since operand
specifications are symbolic in nanuc.
START 101
READ )I 101) + 09 0
NOYER BREG, ONE 102) + 04 2
,t()VEJI BREG, TERN 103) + 06 2
AGAII NULT BREG, TERM 104) + 03 2
,tOVER CREG, TERN 105) + 04 3
ADD CREG, ONE 106) + 01 3
NOVEM CREG, TERN 107) + 06 3
CONP CREG, I 108) + 06 3
BC LE, AGAIJI 109) + 07 2 104
!01v BAEG, TWO ! 110) + 08 2 118
MOVEM 8REC, RESULT 111) + 05 2 115
PRINT RESULT 112) + 10 0 115
STOP 113) + 00 0 000
H DS 1 114)
RESULT DS 1 116)
OIE DC '1' 116) + 00 0 001
TEM•. DS 1 117)
TVO DC '2' 118) + 00 0 001
END

fie, <U Modified as.tembly and machine language propams

Assembly language programming bolds an edge over HLL programming in sit-


uations where it is necessary or desirable IO use specific architectural featurcs of a
computer-for ex.ample, ,pecial instructions supported by the CPU.

'-l A SIMPLE ASSEMBLY SCHEME


The fundamental translation model is mocivaled by Definition 1.2. In this section we
use this model to develop preliminary ideas oo the design of an assembler. We will
use these ideas in Sections 4.4 and 4.5.

Dalp sp«fflcadoa 9'an a111nbler


We use a four step approach 10 develop a design specification for an assembler:
l . Identify the information necessary to perform a wk.
92 Systems Propmmin& & Operatina S)''Stems

2. Design a suitable du suuaure to aeoord lbe informatioa.


3. Determine the processing necessary to obtain -net n,ai,nin fhe.informarioo.
4. Determine the proc:asing necamy to perform the mt.
The fundamental informalioa rapnremalll lriJe in the synlbesis phase of an
assembler. Hence it is belt to bepn by C0llliderma the i.nformation requirements
of the synthesis talks. We then consider bow to mm: this mfoumtion •vailable. i.e.
whecber it should be collected during analysis or derived durina synthesis.

SyntNsi.r pha.J~
Consider tht assembly lbllel'Dellt

NOVBa BUG, OD

in Fig. 4.3. We must ba\'e the following information to synthesize the machine in-
Sb'Udion c:o1rapooding to this 1t11enient:
I. Address of the menwy WOid with which name OD is uaociared.
2. Machine operation code COl!apoi.ding to the mnemonic NOVEil
The fine item of infOl'IDIDOII depends on the soun:e POiPBm. Hence it must
be made available by the analysis phue. The IIICOod item of information does not
depend on the soun:e popmn. it merely cir.pends on the ,ssc,nt,-y languap. Hence
lbe synthesis pbue can determine dm b4onmdoo foe ielelf.
use
Based on the above diacuuioa, we cn111ider ~ of two data structures durina
the synthesis phw:
I. Symbol table
2. Mnemonics table.
Each entry of the symbol table bu two primary fieldl nanw and addrus. The t■-
ble ii built by the analysis pbae. An eotty in 11w mnemonics table bas two primary
fielcls-mnalonk and o,,ctJM. The l)•idwi• phase mes these tables to obtain the
machine addreu with which a mme is 11anr.iwd, and tbr madu11e opcode corre-
spondina to a mnemonk. reapecdvely. Hence lbe tabla have to be aearcbed with the
symbol name and lhe D1emn-: u bys.
Analysupl;4#
The primvy function performed by lhe analysis pbae is the h1ildin, of the symbol
table. For thil purpose it IDUI( determine the addret1.:s with which the symbolic
names used in a piopam _. utorielled It ii poaible to delamine some addresses
directly. e.g. the addreu of the ftnt imtn1ctioo in tbt POIIIIID. howewl ocben must
be inferred. Consider the utembly pOjlUD of Fis- 4.3. To detamioe the addlaa of
Assemblers 93

N, we must fix the addrases of all program elements pcca:ding it This function is
called fMfflDf'Y allocalion.
To implement memory allocation a dala structure called location counur {LC) is
introduad The localion counter is always made to contain the address of the next
memory word in the t1rJe1 proaram. It is initialized to the constant specified in the
START statement. Whenever the analysis phase sees a label in an assembly statement.
it enten the label and the contents of LC in a new entry of the symbol table. It then
finds the number of memory words required by the assembly stalaDe:Dt and upd11es
the LC COl'llml. (Hence the wad "counter' in 'location c:oumer• .) This ensures
that LC poim to the next memory want in the tlrJet program even when machine
instructions have different lengths and DSJDC stalemellts reserve diffenmt amounts
of memory. To update the contents of LC. analysis phase needs to know lengths of
differen1 insuuctioos. This information aim:,ly depends on the assembly languaae,
hence the mnemonics table can be extended to illclude this infonnatioa in a new field
called length. We refer to the processing imolved in maintaining the location counter
u LC pnx%ssiltg.
,..,.,.,,._ op_
onlc code ,~,, th
ADD 01 I
SUB 02 I
MDClllOllka table

Analysis _ __ __ __ __ _ _ Syntbeaia Target


pbue pblle Program

sy,,tbol addru,
AGAIN 104
N 113 - Dara access
- - • Control lflmfer
Symbol table

,.., u Dita l1nlcblrel rA tbt , , - mbler

Figure 4.6 illllllnla the me of the dala structura by the analysis and synthesis
pb"". Nole that the Mnemonics table is a fixed table which ii merely accessed by
the analysis md l)'Dlheail pbues, while the Symbol table is construc:ted durina anal-
y'lil and med during synchesia. The tub performed by the analysis and syntbesu
phases are • follows:

AMJysu pltax 1. hollte die label, mnemonic opcode and operand fields
of1.,..men1.
~ Systems Programming & C>pu:lling Systems

2. If a label is present. enter •'le pair (symbol.


<LC conJenls>) in a nc,N ·ntry of symbol
table.
3. Check validity of the mnemonic opcode
through a look-up in the Mnemonics table.
4. Pcrfonn LC pnx:essing. i.e. update lhe value
coolained in LC by considering the opcode
and operands of the statement

Synthe1is phase I. Obtain the machine opcode c.ouesponding to


the mnemonic from the Mnemonics table.
2. Obcain ~ of a memory operand from
the Symbol lable.
3. Synthesize a machine instruction or the ma-
chine form of a c:onsiant. as the Ctie may be.

4..3 PASS STRUCTURE OF ASSEMBLERS


In Section 1.3 we have defined a pass of a language processor as ooe complete scan of
lhe source program. or its equivalent repraenlation (see Definition 1.4). We discuss
two pass and single pass usembly schemes in this section.

Two pMl lnmh•',on


Two pass ttanslation of an a.uembly language program can handle forward rcfcr-
ences ea.,ily. LC processing is performed in the first pus and symbols defined in the
program arc entered into the symbol table. The second pass synthesizes the target
fonn using the address information found in the symbol table. In effect. the first pass
performs analysis of the source program while the second pus performs synthesis
of the target program. The first pass conslnldS an intcnnediau: representation (IR)
of the source program for use by the second pass (see Fig. 4.7). This representation
consists of two main components d::a slJUCtUl"eS. e.g. the symbol table. and a pro-
cessed form of lhe source program. The latter component is called inlemiediate cock
(IC).
SI.... pall lramladoa
LC processing and construction of the symbol table proceed as in two pus transla-
tion. The problem of forward refercnces is tackled using a process called backpalch-
ing. The operand field of an instruction containing a forward reference is left blank
initially. The address of the forward referenced symbol is put into this field when its
definition is encountered. In the program of Fig. 4.3. the instruction conaponding
to the statement

MOVER BREG, OIE


Assemblers 95

Data structures

Soun:e Target
Pue I PueD
Program
Proanm

- Data ICCCSS
- • .. Conll'OI transfer
Ir.ret mediare code

Fis, 4.7 Overview o( IWO pall' Hwmbly

can be only partially synthesized since on is a forward reference. Hence the in-
sttuction opcode and address of BREO will be assembled to reside in location IOI.
The ~ for inserting the second operand's address at a later stage can be indicated
by adding an entry to the Table of Incomplete lnsuuctions (TD). This entry is a pair
(<innrvction oddnss>. < symbol> ). e.g. (101 . Oft) in this case.
By the time the EID llatement is processed, the symbol table would contain lhe
addresses of all symbols defined in the source program and Til would contain in-
formation describing all forw~ references. The assembler can now process each
entry in TD to complete the concerned imttuction. For example, the entry (101 . ONE)
would be pr-ottSscd by obtaining the address of OIi from symbol table and inserting
it in the operand address field of the instruction with assembled addres., 101. Al-
ternatively, e.ntrics in TIJ can be processed in an incremenlal manner. Thus. when
definition of some symbol sy,nb is encountered. all forward references 10 symb can
bt: processed

4.4 DP.SIGN OF A nvo p~ ~EMBLER


Tasks performed by the passes of a two pass ~bier are as follows:

Pas l I. Scplrlle the symbol. mnemonic opcode and operand


fields.
2. Build the symbol table.
3. Perform LC pocessina.
4 . Consuuct in1e111~11c representation.

Pan D Synthesi7.e the target program.

Pass I performs analysis of the source program and synthesis of the inlermediate rep-
resentation while Pass D processes the incennediate represenwion to synthcsii.e the
'6 S)'llfflll n1 & Opmna1 SyA:ml
target prosnm. The design decails of usrmbler pases are discussed after inb'Oduc-
llll advanced assembla din:cti\lel and their influence on LC processing..
U1 Ad♦ .. J\ 111 pHff Dbedha
OJUGII
The l)'lllU of this d:in:ctive is

OIUGII
where <addtaJ sp«> ii an <operand sp«> or <corutant>. This directive incli-
Clla that LC should be Id to the addleu pven by <addreJS spec>. The 01\IGII
ff#ement ii Uleful when the targd proji ■m does not consisc of consecutive memory
words. The ability to Ille Ill <OJWrand q,«> in the OIUGII statement provides the
ability to perform LC processing m a relatM rather than abwluJ~ manner. Exam-
ple 4.1 illusttatea the diffa~nces between the two.

IY• ph 1.1 Stalement number 18 olfia. 4.l(a). viz. OJUGI• LOOP+2. tets LC to the value
204. lincc die symbol LOOP ii ll10Ciared with lhe addras 202. The next statemcnL
viz.
NULT CUG, B

is tberefcn SMD lhe addreu 2<M. The ltallrment ORIGIN UST+t sets LC to addras
217. NOie 1h11 • equivalem effect could have been achleYed by uaina the swrmcnts
mua1• 202 and OlllGI• 217 11 lhele two places ia the prosnm. however lhe ab-
- - lddreuea uted in thele swenienll would need to be chanpd if lhe address
ipedfk:Mioa ia die STOT wntent iacbanpd.

IQU

The BQU l1lllaDall bu me syncu

<•,-&o1> IQU

where <oddrus sp«> ii an <op,rond s,,«> or <COl&llalll>.


The BQU stll.elDCDl definea die symbol co ,epraent <"""1nu sp«>. This dif-
fers from the DC/DS stllft•im II no LC ~•i*'I ii implied. Thus EQU simply
1uoci1res the name <$)Wlbol> with <atldra& s,,«>.

g. gIr U Swemeaa 22 of Fia.4.8(a). viz. BACI EQU. LOOP introducn lhe symbol BACK
11D aep_. di., opaiiiCI LOOP. Tbis ii bow die I(I ICalement. viz.

BC LT, BACK
Aslemblers 97
-
1 START 200
2 NOYD &REC, •'6' 200) +o4 1 211
3 NOVEN &REC, ' 201) +o6 1 217
4 LOOP NOYD AR.BG, A 202) +o4 1 217
6 NOVEil CREG, 8 203) +06 3 218
6 ADO CREG, •'1' 204) +ol 3 212
7

12 BC UY, NElT 210) +07 6 214


13 LTOIIQ
•'6' 211) +00 0 006
•'1' 212) +00 0 001
14
16 NEXT SUB AREG, • 1 1' 214) +o2 1 219
16 BC LT. BACK 216) +o7 1 202
17 UST STOP 216) +00 0 000
18 OllIGII LOOP+2
19 NOLT CREG, 8 204) +o3 3 218
20 CIUOIN LAST+l
21 A DS 1 217)
22 BACIC EQU LOOP
23 8 DS 1 218)
24 DD
26 •'1' 219) +00 0 001

Fla, U An 1Nefflbly proaram ilJllllrllina ORIGIN

LTORG
Fig. 4.4 hu shown how literals can be handled in two llepl. F'mt. the lilenl is lffllled
as if ii is a < WJlw> in a DC stllemellt. Le. a memory word conbliniq die value ot
the liaeraJ is formed. Second. this memory word is used • lhe operand in place of
the lila'al. Wbcft should the 11aembler place the word 0011re1panding IO die lilall?
Obviously. it should be placed such dUlt COlllml DeYa' reaches it durina the e.cubOII
ofa program. The LTORO llalement permill a propammer IO specify where lirerals
should be placed. By default. assembler places the litaals after the DD 1tatana11.
Al every LTORO ltllemeDI. as also at the EID IWemenl. tbe assembler aUocala
memory IO the literals of a lil~ral pool. 1be pool cootai111 all literals used in the
pctignm since the start of tbe program or since the last LTORG ~
Exemph 4.3 In Fia. 4.8, lbe lilaala =·.s• and•· t • arc added to the literal pool in ltaMlmab
2 and 6. respecti~ly. the 6ra LTORG ltllemeat (stalemenl QUfflber 13) 1Uoc1tet me
addraaes 2 I I and 212 to lbe values •5• and 'I'. A new lileral pool ii now llalUd. The
value •1• ii pul into this pool in aaateme.nt 15. This value ii allonaed die addleu 219
while proaiuina die DD wmmt. The lileral •• t' lolled iP • • 'IICIII I.S lbm(cn
refen to localioa 219 of die NCOlld pool of litenls l1llher lbal loc Miaa 212 ol lbe b
pool Thul. Ill nleawww to tienla me fm ward maenca a,,, ddnniCII.
91 Systems rroai:amming & Operating S)'llems

The LTORG directive has very tittle rdevance for the simple U&Cmbly language
we have assumed so far. The need IO allocale liaenls Bl intermediate points in the pro-
gram rather than at the end is critically felt in a computer using a base displacement
mode of addressing. e.g. compula'S of the mM 3<6'370 family.
EXERCISE 4.4.1
I . An assembly program coo&ains the swcment
I EQU Y+26
lndicaJc how the EQU saatemcnt can be pracesscd if
(a) T is a Nd refe,a,ce,
(b) Y is a forward reference.
2. Can the operand expression in an ORIGIN statement contain forward refcrcnc:es? If
so. outline how the ltalement can be a-oceaed in a two plSS aaembly xhcmc.
4.4.l Pullold leAw... ... ·
Pass I uses the following dala structures:
OPl'AB A cable of mnemonic opcodes and related infor-
mation
SYMTAB Symbol aable
LITTAB A table of litenh used in lhe program

Figw-e 4 .9 illustrates sample coments of lhese tables while processing the pro-
gram of fig. 4.8. OPl'AB contains the fields IMOIOftic opcotk. class and fflllffllOftic
info. The cl.ass field indicates wbdber the opcode conesponds to an imperative swe--
ment (IS), a declaratw11 stwrnenc (DL) ar an membler din:ctive (AD). If an imper-
ative, the mnemonic iltfo field cooblim the pair (machiM c,pcode, instruction length).
else it contains the id of a routine to handle the declaration or directive statement. A
SYMTAB entry 00Dlains the fields address and lotgdt. A LITTAB entry contains the
fields literal and addrus.
Processing of an ~ l y staremeor bq.ios widl the processing of its label field.
If it cootains a symbol. the symbol and the value in LC is copied into a new entry
of SYMTAB . Thereafter, the functioning of Pus I c:eOlal around the interprctalion
of the OPl'AB entry fm the mnemonic. The dau tieJd of the entry is examined to
determine whether the moem.,,oic beloogs to the clas.1 of impentive. declaration or
assembler directive statements. In the cme of an imperatM statancnt. the length of
the machine instruction is simply added to the LC. The length is also entered in the
SYMTAB entry of the symbol (if any) defined in the statemeot. This compleles the
prcxasing of the stllemenL
f<Jr' a dcclaratioo ar aucmbler directive staaemcnt. the routine mentioned in the
~ i,Jfo fidd ii called to pc:afu11n .tppropriale proceuioa ~the statement. For
_example, in the case of a DS Jlllel!rmt , routine R#7 would be ~led This routine
A$.'1Cmblm 99
lltltelltOllic IMOflOllic
opcotk clau info sylllbol adtlnss lmgth
MOVER IS (04, J) LOOP 202 J
~ DL R#7 NEXT 214 1
START AD RIJI LAST 216 I
.. A 217 1
BACK 202 1
OPTAB
B 218 I

SYMTAB
liuraJ addtas lil•raJ no
l =·s·
2 =· 1•
3 =· 1•
UITAB POOL.TAB
.... .U Data SUUC1lftS ~ • • e,lt)a Pw I

processes the operand field of the staacment to determine the amount of memory
required by this statnoeo~ and appropriarely updates the LC and the SYMTAB entty
of lhe symbol (if any) defined in the staaement. Similarly, for an assembler directive
th<- called routine would perform approprillle prooessing. possibly affecting lhe value
in LC.
The use of LnTAB needs some explanation. The first pas uses LrrTAB to col-
lect all literals used in a program. Awareness of different literal pools is maintained
using the auxiliary table POOLTAB. This table contains the literal number of the
starting lila'al of each lilaal pool: At any saage, the cwmtt literal pool is the last
pool in UTTAB. On cocountcring an LTORG 5'alemed (or the EID st11emenl). lit-
erals in lhe currenl pool are alloclled addresses startina with the CWfflll value in LC
and LC is approprialdy incremelud. Thus, the literals of the program in Fig. 4.8(a)
will be alloc.red memory in two saeps. At the LTOILG statcmcnl. lhe first two literals
will be allocated the addresses 211 and 212. At the EID staaement. the third lileral
will be allocared address 219.
We now praeut the alptbm for the first pass of the assembler. lnteau.e-tiate
code forms for use in a t'M' pass assembler are discussed in the next section.
100 S)'SlefflS Programming & Opmting Sysaems

AJaorltlun 4.1 (4....,..,. Fina Pw)


I. loc.J:Jllr := 0: (default value)
pooltab_p1r := I; POOLTAB (I) := I:
linab.ptr := I ;
2. While next staaemenl is not an END statement
(a) If label is present then
this.Jabd :a: symbol in label fid«
Enter (this.laMI. loe ..cnlr) in SYMTAB.
(b) If an LTORG staaemenl then
(i) Process literals LrITAB (POOLTAB (pooltab..ptr]] . . . LllTAB (lil-
tab..ptr- I) to allocaae memory and put the address in the addru:s
field. Updale /oc..cntr accordingly.
(ii) pooltab..plr :a: pooltab..ptr + 1;
(ill) POOLTAB [pooltab..ptr) := littab..ptr.
(c) If a START or ORIGIN staaement then
loc..cnlr := value specified in operand field;
(d) lf an EQU statement then
(i) this..addr := value of <address Jp«>;
(ii) COrTCCt the symtab entry for tlwJDNI to (thi:sJa~I. this .Mdr).
(e) If a declaration statement then
(i) COM :• code of the declaralion SUtemeot;
(ii) :siu := size of memory area required by DC/DS.
(iii) loc ..cnlr :-= loc..cnlr + size;
(iv) Generate IC '(DL cotk) · · ·' .
(0 If an imperative Slatemcnt then
(i) code :• machine opcode from OPTAB;
(ii) loc..cn1r := loc..cnlr + insuuction length from OPTAB:
(iii) If operand is a literal then
thisJiteral := literal in operand field;
LITTAB [linab.ptr) := tlwJiteral:
linab..ptr := linab..ptr + I ;
else (i.e. operand is a symbol)
this.Lnlry := SYMTAB en1ry number of operand:
Generate IC "(IS. eotk)(S, 1his...en1ry)';
3. (Processing of END statement)
(a) Perform 5'ep 2(b).
(b) Genm1le IC '(AD.02)'.
(c) Go to Pus D.
Assemblers Ill

4A.3 Ilda aaedlatc Code Forma


In Section 1.3 two aileria for the choice of incamcdiate code. viz. processing ef-
ficiency and memory economy. have been mentioned. In this section we consider
some variants of intamediate codes and compare them on the basis of these cri1eria
The intennediate code consists of a se1 of IC units. each IC unil consisting of the
following three fields (see Fig. 4.10):
l . Address
2. Repiaentation of the mnemonic opcode
3. Representation of operands.

Address Opcode I Operands


fla. '-10 Aa IC uait

Variant forms of intermediate codes, specifically the operand and address fields.
arise in practice due to the tradeoff belween processin1 efficiency and memory econ-
omy. These varianas are discussed in separate sections dealing wilh the representation
of imperative statements. and declaration statemenb and directives. respectively. The
information in the mnemonic field is assumed to have the same repraenwion in aU
the variants.

MJlffllOlllclleld
The mnemonic field contains a pair of the fonn
(Slatnn~nt class. code)
where staJ~mffll class can be one of IS. DL and AD standing for imperative state-
mcn~ declaration staaement and assembler directive. 1cspectively. For an imperative
statanen~ cotk is the instruction opcode in the machine language. For declarations
and assembler directives. cotk is an ordinal number within the class. Thus. (AD. 01)
stands for assembler directive number l which is the directive STAAT. Figure 4.1 l
shows the CO'lcs for various declaration statements and assembler directives.

Declaration statemenu Anotbler dirttth-es


DC 01 START 01
DS 02 EIID 02
ORIGIN 03
EQU 04
LTORG 05

fla. 4.11 Codel for dccW'lbelll statemenu and directiva


112 Systems Programming & Operating System_c._ _ __ _ __ _ _ _ _ _ _ _ _ __

4.4.4 lntenned iate 0.. ror lmperadt-~ Statamo ts


We consider two variants of inlennediate code which differ in the information con-
tained in their oper.lJ1d field..~. For simplicity. t h e ~ field is assumed lo contain
identical infonnation in bolh variants.

Variant I
The first operand is rcprcscntcd by ,a single digit number which is a code for a reg-
ister (l-4 for ARF.0-DREG) or the condition code ibdf (1-6 for LT-AJIY). The second
operand. which is a memory operand. is represented by a pair of the form
(~rand class. code)

where ~rand class is one of C. S and L !landing for constant , symbol and literal,
respectively (see Fig. 4.12). For a constanl. the code field contains the internal reprc•
scnration of the constant itself. For example. the operand descriptor for the sta1emcnt
START 200 is (C. 200). For a sym~ or literal, the cod~ field contains the ordinal
number of the oper.ind 's entry in SYMTAB or LITTAB . Tbu5 entries for a symbol
IYZ and a literal ~·25· wouJd be of lhc form (S, 17) and (L. JS) respectively.
START 200 (AD,01) (C.200)
RUD i (IS,00) (S.0 1)
LOOP MOVER &R£G, A (IS,04) (l )(S.01 )

SUB AREG, •'1' (IS,02) (I XL.01)


BC CT, LOOP (15.07) (4)(5,02)
STOP (JS.00)
OS 1 (DL, 02) (C. I)
LTORG (DL,05)

f1&. 4.U b~roedia cc code • vwiw I

Note lhaJ this method of representing symbolic operands gives rise 10 one pe-
culiarity. We have so far assumed that an entry is made in SYMTAB only when a
symbol occurs in lhc label field of an a.uembly staaemen l. e.g. an entry (A. 345. I) if
symbol A is allocared one word ar address 345. However. while processing u forward
reference
MOVER AREG, A
it is necessary to enter A in SYMTAB, say in cnary number n. so that it can be repre-
sented by (S. n) in IC. Al Ibis point. lhc address and length ficl<b of A's entry cannoc
be fiUed in. This implies lha1 two kinds of enttie5 may exist in SYMTAB at any
time-fo r defined symbols and for forward references. This fact should be noccd for
use during error detection (see Section 4.4. 7).
Asscmhlers 103

Variant D

This variant differs from variant I of the intermediate code in that the operand fields of
the source statements are selectively replaced by their processed fonns (see Fig. 4.13).
For declarati,·e slalements and assembler directives. processing of the operand fields
is essential to suppon LC processing. Hence these field,; contain lhe processed f ~.
For imperative slatements. the operand field is processed only to identify literal refer-
ences. Literals are cnteml in LITTAB. and are represented as (L. m) in IC. Symbolic
n:ferenccs in the source statement are not proc.esscd at aJl during Pass I.
START 200 (AD.OJ ) (C.200)
READ A (IS.09) A
LOOP HOVER lREC, A (IS.,04) AREG, A

SUB AREC, •'1 ' (lS.02) AREG. (L,01 )


BC CT• LOOP (IS.07) GT, LOOP
STOP (IS.00)
A OS 1 (DL02) (C. I )
LTORG (DLOS)

FIi, 4.13 lmcnnediate oodc • variant [I

Comparison ol tm variants

Variant I of the intcrniediate code appears to require extra work in Pass I since
operand fields are completely processed. However, this p~ing considerably sim-
plifies the tasks of Pass U-a look at the IC of Fig. 4.12 confirms this. The functions
of Pass II are quite trivial. To process the operand field of a declaration statement we
only need to refer to the appropriate table and obtain the operand address. Mos1 dec-
lar.uions do noc require any processing. e.g. DC, OS (sec Section 4.4.5). and START
statements. while some. e.g. LTORG. rcqui~ marginal processing. The IC is quite
compact-it can be~ compact as the target code itself if each operand reference
like (S. n) can be represented in the same number of bits as an operand address in a
machine instruction.
Variant II reduces the work of Pass l by transferring 1he burden of operand pro-
cessing from Pass I lO Pass U of the assembler. The lC is less compact since the
memory operand of a typical imperative statement is in the source form itself. On
the ocher hand. by making Pass ti 10 perform more work, lhe functions and memory
requirements of the two~ get beuer balanced. Figu~ 4.14 illustrates the a<lvan-
tages of this ~pcct. Part (a) of Fig. 4.14 shows memory utilization by an assembler
using variant I of IC. Some data structures, viz. symbol table. are ~sed in the
memory while IC is presumably written in a file. Since Pass I performs much more
processing than Pass II. its code occupies more memory lhan the code of Pass ll. Part
UM Systems Programming & Opena1ing Sys1ems

(b) of Fig. 4.14 shows memory utilization when variant D of IC is used. The code
sizes of the two passes are now comparable. hence the overall memory requirement
of the ~mblcr is lower.


Daui
Pass n

Data
Pus)

Dala
Slnlcturcs
Pass u

Data
structures
strue1urcs structures Work Work
Work Work area area
area area

(a) (b)

Fla, '- 1• Memory requirements usins (a) variant I. ( b) vwiant II

Varianl U is panicularly wcll-suilCd if expressions arc permitted in the operand


fields of an assembly statement. For example, the statement

KOVER !REG, A+5


would appear as

(IS,05) (1) (S,01)+6


in variant I of IC. This does particularly simplify the task of Pass II or save
no(
much mefflO'}' space. In such situations, it would have been preferable not to have
processed the operand field at all.

4.4.5 Pracellina of Dedandom aad Alaembler DlnctlftS


The focus of this discussion is on identifying alternative ways of processing declara-
tion statements and assembler directives. In this context, it is useful to consider how
far these stalements can be processed in Pus I of the assembler. This depends on
answers to two related questions:
I. ls it necessary to represent the address of each source statement in IC?
2. Is it necessary to have an explicit representation of DS statements and assem-
bler directives in IC?
Let the answer to the first question be 'yes'. Now consider the following source
program fragment and its incennediate code:
ST.lRT 200 -) (A0.01) (C.200)
AREAl OS 20 200) (DL.02) (C.20)
SIZE DC 6 220) (DL.01) (C.5)
Herc. it is K-dundant 10 have lhc representations of the START and OS statements
in IC. since the effect of these scatements is implied in the fact thal the DC state-
ment has lhe address 220 ! Thus. it is not necessary to have a represemarion for OS
statements and assembler directives in IC if the JC contains an address field. If rhe
address field of the IC is omitted, a repraencation for lhc DS statements and assem-
bler dim:t:ives becomes essential. Now. Pass U can determine lhe ~ ss for SIZE
only after analyzing the inamnedialC code units for the START and OS Slalements.
1be first alternative avoids this processing but requires the existence of the address
field. Yet anod1cr instance of space--4ime tradcoff !

DC saatement
A DC staaement must be rq,n:sented in JC. The mnemonic field contain.c; the pair
(DL.O I). The operand field may contain the vaJue of the constant in the source form
or in the internal machine representation. No pnx."CSSing advantage exist~ in cithl.-r
(.-.t."ie since conversion of the constant into lhe machine representation is required
anyway. If a DC saatement defines many constants. e.g.

DC '5, 3, -7'
a series of (DL.O I) units can be put in the JC.

SfilT and OR.IC IN


These directives set new values inao the LC. It is not necessary to retain STA.RT and
ORIG II statements in the IC if the IC contains an address field.

LTORG
Pass I checks for the ~nee of a .li&eral refercnce in the operand field of every state-
ment If one exists. it cnten lhe literal in the current literal pool in LITTAB. When
an LTORG Stalemcnl appcan in the source program. it assigns memory addresses lO
the literals in the current pool. These addresses are entered in the addTPss field of
their LITTAB entries.
After performing this fundamental action. two alternatives exist concerning Pus
I processiPg.. Pass I could simply COllSU'UCl an IC unit for the LTORG s&11emeot and
leave all subsequent processing lo Pus n. Values of literals can be inserted in the
target progiam when this JC unit is pi'O<%Ssed in Pass 0 . This requires lhe use of
POOLTAB and LllTAB in a manner analogous to Pus I.
Es■.... 4A Figure 4 .9 shows the U1TAB and POOLTAB for the program of Fig. 4 .8 .al
the end of Pass I. Lla.erals of lhe first pool are copied into the wgct program when the
IC unit for LTORG is encountered in Pass II. Li1mls of the second pool are copied into
the target program when the IC unit for END is processed.
Alternatively. Pass I could itsrlf copy out the literals of the pool into the IC.
This avoids duplication of Pass I actions in P1m IJ. The IC for a literal can be made
106 Sysccms Prognmming & Operating Systems

identical to the IC for a DC sta11mmt so lhat no special pn~ing is required in Pus


n.
Fr•mple 4.5 Figure 4.15 shows lhe JC (or the first half of the program of Fig. 4.8. The
literals of the first pool (lee Pig. 4.9) are c,opled OUl aa LTORG stalemenl. NOlC lhal the
opcode field of the IC units. i.e (DL.01 ). is same as dW for DC swemen1s..

START 200 (AD,01) (C.200)


NOYER AREG, • 6' 1
(]S,04) (I XL.01 )
NOVEii AREG, & (IS.OS) (l )(S,01 )
LOOP NOYD AREG, & (IS,04) o x s.01 >

BC An, NEXT (1S,07) (6)(S,04)


LTORG (DL.01) (C.S)
(Dl..,01 ) (C.l )

fla, 4.15 Copyina o( liletal values imo i111ttmrdi11e code

HowC'Yer, this altanative incremes the lask., to be performed by Pa.u I. conse-


quently increa.,ing its size. This might lead to an unbalanced ptiS saucture for the
assembler with the coosequences illustraled in Fig. 4.14. Secondly, the lilerals have
to exist in two forms simultaneously, in the LITTAB along wilh the address informa-
tion. and also in lhe intermediate code.

EXERCISE4A
I . Given lhe foUowing IOW'Ce program:

STilT 100
A DS 3
Ll NOYER AREG, B
&DD AREG, C
NOVDI AREG, D
D !QU A+l
L2 PIJIT D

ORIGI.I A-1
1
C DC 6'
ORIGIN L2+1
STOP
B DC '19'
EID Ll
(a) Show the contents of lhe symbol table at the end of Pass I.
(b) Explain lhe sipifiamce of EQU and O&IGII SWCmcnlS in the prosram and ex-
plain bow Ibey n pnxaleei by die aaembler.
(c) Show lbe IDfelmer!i• ':ock ,_.... for the propam.
Asscmblm lf7
4A.6 PwDoltbeA• +!Mer

Algorithm 4 .2 ls the algoridun for assembler Pass D. Minor changes may be needed
to suit die IC being used. It bas been assumed that the target code is to be wembled
in the area named code..lllWl.

4.2 (An -Mer Secoad Pw)


1. c o d e ~ :a llddlcu of code_arm;
pooltab..ptr :s 1;

loc-<:nlT O;
2. While next statement is 1101 an END stalement
(a) Clear machiM..eotU..bvjfer.
(b) If ID LTORO ltllemcnt
(i) Process literals in LITTAB [POOLTAB IJ,oollab..ptrJ J •.•• LTTTAB
[POOLTAB[pooltab_ptr+l) )-1 similar to processing of cooswits
in a DC st11emen1, i.e. aaemble the literals in mochilw..c«k.buj/er.
(ii) siu := size ex memory area required for literals;
(iii) pooltab_ptr := pooltab..ptr + I;
(c) If a START or ORIGIN sr1aerneot then
(i) loc..cntr := value specified in operand field.;
<ii> nu := 0;
(d) If a dechnr:ioo scaremena
(i) If a DC S1a1emen1 then
Assemble the constant in lflOChiM..codeJ,,qfer.
(ii) siu := size of memory area requin:d by DCJDS;
(e) If an imprntiw- st•erneo•
(i) Gd operand address from SYMTAB or LITTAB.
(ii) Assemble inslnletioa in machiM..cOM.bt4fer.
(iii) siu := size of instruction;
<O If me , o then
(i) Mow coormts of ,n,ad,i,w.cotk..bu/fer to the address c<Jtk_area-
..addreu + loc..cn1r.
(ii) loc.Dllr :• loc..cnlr + siu;
3. (Processing of END saarement)
(a) Paform steps 2(b) and 2(0.
(b) Write COM..IIJWl into output file.
I
UNI SySICms Programming & Operating Syscems
l
Outpat bdaface oldie eeeanb"':r

It has been usumed that the assembler produces a WJCI program which is the ma-
chine language of the target computer. This is rarely (if ever !) the case. The assem-
bler produces an obj«I modul~ in lhe format required by a linkage editor ,o r loader.
The information contained in object modules is discussed in Chapter 7.

4.4.7 IJldnaandErrorRapordaa
Design of an error indication scheme involves some decisions which influence the
effectiveness of error reporting and the speed and memory requirements of the as-
sembler. The basic decision is whether to produce pcogram liSling and error reports
in Pass I or delay these actions until Pass IL Producing lhe listing in the fin( pass
has the advantage that the source program need not be preserved till Pus 11. This
conserves memory and avoids some amount of duplicaae processing.
This design decision also has very imponant implications from a programmer·s
viewpoint A listing produced in Pass I can report only cenain errors in1the most
relevant place. that is. against the source slalement itself. Examples of such errors me
syntax errors like missing commas or parentheses and semantic errors like duplicale
definitions of symbols. Other errors like references to undefined variables can only
be reported at~ end of the source program (sec Fig. 4.16). The target code can be
printed laier in Pass JI. however it is, difficult to locale the target code corresponding
to a sowu Slalement and vice vena.. All these factors make debugging difficult

Sr. No. S1atmta11 Add~u


001 START 200
002 NOYER AREG, A 200

003
009 NVEJl BREG, A 207
••error•• Invalid opcode
010 ADD BREG, B 208
014 A DS 1 209
015
021 A DC '6' 227
••error•• Duplicate definition of ayabol A

022
035 EID
••error•• Undefined •Jllbol Bin atat. .nt 10

fll, '-16 Error rcponins in pus I

For effective error reporting. it is necessary to report all errors against the crro-
Assemblers Je,

ncous stateme ot itself. This can be achieved by delaying p,ogram listing and error
reporting till Pus D. Now the error rq,ons a well as the tarpc code can be priDled
apima each source stateme nt (see Ex. 4 .6).

Ex••• 4.6 Fillft 4.16 iOUSb'llel error repo,tin1 in Pass I. Delcction of emn in mae-
men11 9 and 21 ia IU'liptfo rward. In statane nt 9, the opcode iJ known 10 be invalid
becw•te it does noc nwch wilh any nmemonic in OPTAB. In SlaCCmelll 21. Ais known
10 be a duplicalr definition became an entry for A alrady exiats in the symbol table.
Ute of die undefined symbol B ii harder to detec::f because at die end of Pus I we
ha~ no rec0ld that a forward reference IO B uisu in auaemen1 10. This problem can
be raolved by makina an entry for 8 in the symbol table with an indication chat a
forward rd'aeoc e 10 B exiJu in aaemeo c I0. AU such entries would be pnxesled at
die end ol PISS I to cbcck if a definition ol the symbol has been encountered. If noc.
die symbol table eacry contains sufficient infO"Nrio a for error rq,o,tin1. Note that
the taraet ~ caolJOf t,e prinrcd beca1,c Ibey ~ DO( yec been generated.
The memory address is primed against each llale1'DeDI in a weak aaempt 10 provide a
croa-reference between IOUl'CC statemenll and 1arJCC instructions.

E::-p h 1.7 Fipre 4.17 illUitlik i error reportina performed in Pus II. Indication of errors
in staremeots 9 and 21 is u cay u in Ex. 4.6. hw:ticarion ol the error in swcmenc JO
is equally euy-ch e symbol cable is tearcbcd for an cncry olB and an error is reported
when no mar.chins entry is found. Note lhat 1arp1 propu , imtructims appear apinsa
die mun:e mtemeo ts 10 wtuch Ibey belon1.

Sr. No. S1a1otm1 """1wu /IVlnlltl io,t

001 STAJ\T 200


002 NOYD AUG, A 200 + 04 1 209

003
009 KYER BUG, A 207 + -- 2 209
•• error •• IAY&li d opcode
010 ADD BUG, 8 208 + 01 2 ---
•• error •• 11Ddefi.Ded ayabol 8 iJa operu d field
014 A DS 1 209
015
021 A DC '5' 227 + 00 0 005
••err or•• Duplic ate defilai tion of •Jllbol A

022
036 ElfD

,.._ ~l7 Emw rcportiaa In.- U

You might also like