You are on page 1of 100

CS2304 SYSTEM SOFTWARE

UNIT I INTRODUCTION 8
System software and machine architecture The Simplified Instructional Computer (SIC)
- Machine architecture - Data and instruction formats - addressing modes - instruction
sets - I/ and programming!
UNIT II ASSEMBLERS 10
"asic assem#ler functions - $ simple SIC assem#ler $ssem#ler algorithm and data
structures - Machine dependent assem#ler features - Instruction formats and addressing
modes %rogram relocation - Machine independent assem#ler features - &iterals
Sym#ol-defining statements '(pressions - ne pass assem#lers and Multi pass
assem#lers - Implementation e(ample - M$SM assem#ler!
UNIT III LOADERS AND LINKERS 9
"asic loader functions - Design of an $#solute &oader $ Simple "ootstrap &oader -
Machine dependent loader features - )elocation %rogram &in*ing $lgorithm and
Data Structures for &in*ing &oader - Machine-independent loader features - $utomatic
&i#rary Search &oader ptions - &oader design options - &in*age 'ditors Dynamic
&in*ing "ootstrap &oaders - Implementation e(ample - MSDS lin*er!
UNIT IV MACRO PROCESSORS 9
"asic macro processor functions - Macro Definition and '(pansion Macro %rocessor
$lgorithm and data structures - Machine-independent macro processor features -
Concatenation of Macro %arameters +eneration of ,ni-ue &a#els Conditional Macro
'(pansion .eyword Macro %arameters-Macro within Macro-Implementation e(ample -
M$SM Macro %rocessor $/SI C Macro language!
UNIT V SYSTEM SOFTWARE TOOLS 9
Te(t editors - 0er0iew of the 'diting %rocess - ,ser Interface 'ditor Structure! -
Interacti0e de#ugging systems - De#ugging functions and capa#ilities )elationship with
other parts of the system ,ser-Interface Criteria!
TEXT BOOK 1! &eland &! "ec*2 3System Software $n Introduction to Systems
%rogramming42 5rd 'dition2 %earson 'ducation $sia2 6778!
REFERENCES
1! D! M! Dhamdhere2 3Systems %rogramming and perating Systems42 Second
)e0ised 'dition2 Tata Mc+raw-9ill2 6777!
6! :ohn :! Dono0an 3Systems %rogramming42 Tata Mc+raw-9ill 'dition2 6777!
1
UNIT I
INTRODUCTION TO SYSTEM SOFTWARE AND
MACHINE STRUCTURE
1.1 SYSTEM SOFTWARE
System software consists of a 0ariety of programs that support the operation of a
computer!
It is a set of programs to perform a 0ariety of system functions as file editing2
resource management2 I/ management and storage management!
The characteristic in which system software differs from application software is
machine dependency!
$n application program is primarily concerned with the solution of some
pro#lem2 using the computer as a tool!
System programs on the other hand are intended to support the operation and use
of the computer itself2 rather than any particular application!
;or this reason2 they are usually related to the architecture of the machine on
which they are run!
;or e(ample2 assem#lers translate mnemonic instructions into machine code! The
instruction formats2 addressing modes are of direct concern in assem#ler design!
There are some aspects of system software that do not directly depend upon the
type of computing system #eing supported! These are *nown as machine-
independent features!
;or e(ample2 the general design and logic of an assem#ler is #asically the same
on most computers!

TYPES OF SYSTEM SOFTWARE:
1! perating system
6! &anguage translators
a! Compilers
#! Interpreters
c! $ssem#lers
d! %reprocessors
5! &oaders
<! &in*ers
=! Macro processors
OPERATING SYSTEM
It is the most important system program that act as an interface #etween the users
and the system! It ma*es the computer easier to use!
6
It pro0ides an interface that is more user-friendly than the underlying hardware!
The functions of S are>
1! %rocess management
6! Memory management
5! )esource management
<! I/ operations
=! Data management
8! %ro0iding security to user?s @o#!
LANGUAGE TRANSLATORS
It is the program that ta*es an input program in one language and produces an output in
another language!
Source Program Object Program
Compilers
$ compiler is a language program that translates programs written in any high-
le0el language into its e-ui0alent machine language program!
It #ridges the semantic gap #etween a programming language domain and the
e(ecution domain!
Two aspects of compilation are>
o +enerate code to increment meaning of a source program in the e(ecution
domain!
o %ro0ide diagnostics for 0iolation of programming language2 semantics in a
source program!
The program instructions are ta*en as a whole!
High level language Machine language program
Interpreters:
It is a translator program that translates a statement of high-le0el language to
machine language and e(ecutes it immediately! The program instructions are
ta*en line #y line!
The interpreter reads the source program and stores it in memory!
5
Language
Translator
Compiler
During interpretation2 it ta*es a source statement2 determines its meaning and
performs actions which increments it! This includes computational and I/
actions!
%rogram counter (%C) indicates which statement of the source program is to #e
interpreted ne(t! This statement would #e su#@ected to the interpretation cycle!
The interpretation cycle consists of the following steps>
o ;etch the statement!
o $nalyAe the statement and determine its meaning!
o '(ecute the meaning of the statement!
The following are the characteristics of interpretation>
o The source program is retained in the source form itself2 no target program
e(ists!
o $ statement is analyAed during the interpretation!
Interpreter Memory
Assemblers:
%rogrammers found it difficult to write or red programs in machine language! In a
-uest for a con0enient language2 they #egan to use a mnemonic (sym#ol) for each
machine instructions which would su#se-uently #e translated into machine
language!
Such a mnemonic language is called $ssem#ly language!
%rograms *nown as $ssem#lers are written to automate the translation of
assem#ly language into machine language!
$ssem#ly language program Machine language program
;undamental functions>
1! Translating mnemonic operation codes to their machine language e-ui0alents!
6! $ssigning machine addresses to sym#olic ta#les used #y the programmers!
<
Assembler
Program
counter
Source
Program
1.2 THE SIMPLIFIED INSTRUCTIONAL COMPUTER (SIC):
It is similar to a typical microcomputer! It comes in two 0ersions>
The standard model
B' 0ersion
SIC Machine Structure:
Memory:
It consists of #ytes(C #its) 2words (6< #its which are consecuti0e 5 #ytes)
addressed #y the location of their lowest num#ered #yte!
There are totally 562D8C #ytes in memory!
Registers:
There are = registers namely
1! $ccumulator ($)
6! Inde( )egister(B)
5! &in*age )egister(&)
<! %rogram Counter(%C)
=! Status Eord(SE)!
$ccumulator is a special purpose register used for arithmetic operations!
Inde( register is used for addressing!
&in*age register stores the return address of the @ump of su#routine instructions
(:S,")!
%rogram counter contains the address of the current instructions #eing e(ecuted!
Status word contains a 0ariety of information including the condition code!
Data formats:
Integers are stored as 6<-#it #inary num#ers> 6?s complement representation is
used for negati0e 0alues characters are stored using their C #it $SCII codes!
They do not support floating point data items!
Instruction formats:
$ll machine instructions are of 6<-#its wide
B-flag #it that is used to indicate inde(ed-addressing mode!
Addressing modes:
=
Opcode (8) X (1) Address (15)
Two types of addressing are a0aila#le namely2
1! Direct addressing mode
6! Inde(ed addressing mode or indirect addressing mode
Mode Indication Target Address calculation
Direct BF7 T$F$ddress
Inde(e
d
BF1 T$F$ddress G (B)
Ehere(() represents the contents of the inde( register(()
Instruction set:
It includes instructions li*e>
1! Data mo0ement instruction
'(> &D$2 &DB2 ST$2 STB!
6! $rithmetic operating instructions
'(> $DD2 S,"2 M,&2 DI"!
This in0ol0es register $ and a word in memory2 with the result #eing left in the
register!
5! "ranching instructions
'(> :&T2 :'H2 T+T!
<! Su#routine lin*age instructions
'(> :S,"2 )S,"!
Input and Output:
I/ is performed #y transferring one #yte at a time to or from the rightmost C #its
of register $!
'ach de0ice is assigned a uni-ue C-#it code!
There are 5 I/ instructions2
1) The Test De0ice (TD) instructions tests whether the addressed de0ice is
ready to send or recei0e a #yte of data!
6) $ program must wait until the de0ice is ready2 and then e(ecute a )ead
Data ()D) or Erite Data (ED)!
5) The se-uence must #e repeated for each #yte of data to #e read or written!
1.3 SIC/XE ARCHITECTURE & SYSTEM SPECIFICATION
8
Memory:
1 word F 6< #its (5 C-#it #ytes)
Total (SIC/B') F 6
67
(127<C2=D8) #ytes (1M#yte)
Registers:
17 ( 6< #it registers
MNEMONIC Register Purpose
$ 7 $ccumulator
B 1 Inde( register
& 6 &in*age register (:S,"/)S,")
" 5 "ase register
S < +eneral register
T = +eneral register
; 8 ;loating %oint $ccumulator (<C #its)
%C C %rogram Counter (%C)
SE I Status Eord (includes Condition Code2 CC)


Data Format:
Integers are stored in 6< #it2 6Js complement format
Characters are stored in C-#it $SCII format
;loating point is stored in <C #it signed-e(ponent-fraction format>


The fraction is represented as a 58 #it num#er and has 0alue #etween 7 and 1!
The e(ponent is represented as a 11 #it unsigned #inary num#er #etween 7 and
67<D!
The sign of the floating point num#er is indicated #y s > 7Fpositi0e2 1Fnegati0e!
Therefore2 the a#solute floating point num#er 0alue is> fK6
(e-176<)
Instruction Format:
There are < different instruction formats a0aila#le>
;ormat 1 (1 #yte)>
op LCM

D
s e(ponent L11M fraction L58M
;ormat 6 (6 #ytes)>
op LCM r1 L<M r6 L<M

;ormat 5 (5 #ytes)>
op L8M n i ( # p e displacement L16M

;ormat < (< #ytes)>

;ormats 5 N < introduce addressing mode flag #its>
nF7 N iF1
Immediate addressing - T$ is used as an operand 0alue (no memory reference)
nF1 N iF7
Indirect addressing - word at T$ (in memory) is fetched N used as an address to
fetch the operand from
nF7 N iF7
Simple addressing T$ is the location of the operand
nF1 N iF1
Simple addressing same as nF7 N iF7
;lag (>
(F1 Inde(ed addressing add contents of B register to T$ calculation

;lag # N p (;ormat 5 only)>
#F7 N pF7
Direct addressing displacement/address field containsT$ (;ormat < always uses
direct addressing)
#F7 N pF1
%C relati0e addressing - T$F(%C)Gdisp (-67<COFdispOF67<D)K
#F1 N pF7
"ase relati0e addressing - T$F(")Gdisp (7OFdispOF<7I=)KK
;lag e>
eF7 use ;ormat 5
eF1 use ;ormat <

C
op L8M n i ( # p e address L67M
Instructions:
SIC pro0ides 68 instructions2 SIC/B' pro0ides an additional 55 instructions (=I total)
SIC/B' has I categories of instructions>
&oad/store registers (&D$2 &DB2 &DC92 ST$2 STB2 STC92 etc!)
integer arithmetic operations ($DD2 S,"2 M,&2 DIP) these will use register $
and a word in memory2 results are placed into register $
compare (CM%) compares contents of register $ with a word in memory and
sets CC (Condition Code) to O2 Q2 or F
conditional @umps (:&T2 :'H2 :+T) - @umps according to setting of CC
su#routine lin*age (:S,"2 )S,") - @umps into/returns from su#routine using
register &
input N output control ()D2 ED2 TD) - see ne(t section
floating point arithmetic operations ($DD;2 S,";2 M,&;2 DIP;)
register manipulation2 operands-from-registers2 and register-to-register arithmetics
()M2 )S,"2 CM%)2 S9I;T)2 S9I;T&2 $DD)2 S,")2 M,&)2 DIP)2 etc)

Input and Output (I/O):

6
C
(6=8) I/ de0ices may #e attached2 each has its own uni-ue C-#it address
1 #yte of data will #e transferred to/from the rightmost C #its of register $
Three I/ instructions are pro0ided>
)D )ead Data from I/ de0ice into $
ED Erite data to I/ de0ice from $
TD Test De0ice determines if addressed I/ de0ice is ready to send/recei0e a #yte
of data! The CC (Condition Code) gets set with results from this test>
O device is ready to send/receive
F device isn't ready

SIC/B' 9as capa#ility for programmed I/ (I/ de0ice may input/output data while C%,
does other wor*) - 5 additional instructions are pro0ided>
SI Start I/
9I 9alt I/
TI Test I/
1.4 SIC, SIC/XE ADDRESSING MODES
Addressing
Type
Flag Bits
Notation
Calculation of
Target Address
Operand Notes
n i x b p e
I
Simple 1 1 7 7 7 7 op c disp (T$)
Direct-addressing
Instruction
1 1 7 7 7 1 Gop m addr (T$)
;ormat < N Direct-
addressing Instruction
1 1 7 7 1 7 op m (%C) G disp (T$)
$ssem#ler selects
either #ase-relati0e or
program-counter
relati0e mode
1 1 7 1 7 7 op m (") G disp (T$)
$ssem#ler selects
either #ase-relati0e or
program-counter
relati0e mode
1 1 1 7 7 7 op c2B disp G (B) (T$)
Direct-addressing
Instruction
1 1 1 7 7 1 Gop m2B addr G (B) (T$)
;ormat < N Direct-
addressing Instruction
1 1 1 7 1 7 op m2B
(%C) G disp G
(B)
(T$)
$ssem#ler selects
either #ase-relati0e or
program-counter
relati0e mode
1 1 1 1 7 7 op m2B (") G disp G (B) (T$)
$ssem#ler selects
either #ase-relati0e or
program-counter
relati0e mode
7 7 7 - - - op m #/p/e/disp (T$)
Direct-addressing
InstructionR SIC
compati#le format!
7 7 1 - - - op m2B #/p/e/disp G (B) (T$)
Direct-addressing
InstructionR SIC
compati#le format!
Indirect 1 7 7 7 7 7 op Sc disp ((T$))
Direct-addressing
Instruction
1 7 7 7 7 1 Gop Sm addr ((T$))
;ormat < N Direct-
addressing Instruction
1 7 7 7 1 7 op Sm (%C) G disp ((T$))
$ssem#ler selects
either #ase-relati0e or
program-counter
relati0e mode
1 7 7 1 7 7 op Sm (") G disp ((T$)) $ssem#ler selects
17
either #ase-relati0e or
program-counter
relati0e mode
Immediate 7 1 7 7 7 7 op Tc disp T$
Direct-addressing
Instruction
7 1 7 7 7 1 op Tm addr T$
;ormat < N Direct-
addressing Instruction
7 1 7 7 1 7 op Tm (%C) G disp T$
$ssem#ler selects
either #ase-relati0e or
program-counter
relati0e mode
7 1 7 1 7 7 op Tm (") G disp T$
$ssem#ler selects
either #ase-relati0e or
program-counter
relati0e mode
11
UNIT II
ASSEMBLERS
2.1. BASIC ASSEMBLER FUNCTIONS
;undamental functions of an assem#ler>
Translating mnemonic operation codes to their machine language e-ui0alents!
$ssigning machine addresses to sym#olic la#els used #y the programmer!
Figure 2.1: Assembler language program for basic SIC version
16
Inde(ed addressing is indicated #y adding the modifier 3 B4 following the operand!
&ines #eginning with 3!4 contain comments only!
The following assem#ler directi0es are used>
START: Specify name and starting address for the program!
END : Indicate the end of the source program and specify the first e(ecuta#le
instruction in the program!
BYTE: +enerate character or he(adecimal constant2 occupying as many #ytes as
needed to represent the constant!
WORD: +enerate one- word integer constant!
RESB: )eser0e the indicated num#er of #ytes for a data area!
RESW: )eser0e the indicated num#er of words for a data area!
The program contains a main routine that reads records from an input de0ice( code ;1)
and copies them to an output de0ice(code 7=)!
The main routine calls su#routines>
RDREC To read a record into a #uffer!
15
WRREC To write the record from the #uffer to the output de0ice!
The end of each record is mar*ed with a null character (he(adecimal 77)!
2.1.1. A Simple SIC Assembler
The translation of source program to o#@ect code re-uires the following functions>
1! Con0ert mnemonic operation codes to their machine language e-ui0alents! 'g>
Translate ST& to 1< (line 17)!
6! Con0ert sym#olic operands to their e-ui0alent machine addresses! 'g>Translate
)'T$D) to 1755 (line 17)!
5! "uild the machine instructions in the proper format!
<! Con0ert the data constants specified in the source program into their internal
machine representations! 'g> Translate '; to <=<;<8(line C7)!
=! Erite the o#@ect program and the assem#ly listing!
$ll fuctions e(cept function 6 can #e esta#lished #y se-uential processing of source
program one line at a time!
Consider the statement
17 1777 ;I)ST ST& )'T$D) 1<1755
This instruction contains a forward reference (i!e!) a reference to a la#el ()'T$D)) that
is defined later in the program! It is una#le to process this line #ecause the address that
will #e assigned to )'T$D) is not *nown! 9ence most assem#lers ma*e two passes
o0er the source program where the second pass does the actual translation!
The assem#ler must also process statements called assembler directives or pseudo
instructions which are not translated into machine instructions! Instead they pro0ide
instructions to the assem#ler itself!
'(amples> )'S" and )'SE instruct the assem#ler to reser0e memory locations without
generating data 0alues!
The assem#ler must write the generated o#@ect code onto some output de0ice! This o#@ect
program will later #e loaded into memory for e(ecution!
Object program format contains three types of records:
Header record> Contains the program name2 starting address and length!
Text record> Contains the machine code and data of the program!
End record> Mar*s the end of the o#@ect program and specifies the address in the
program where e(ecution is to #egin!
1<
Record format is as follows:
Header record:
Col! 1 9
Col!6-D %rogram name
Col!C-15 Starting address of o#@ect program
Col!1<-1I &ength of o#@ect program in #ytes
Text record:
Col!1 T
Col!6-D Starting address for o#@ect code in this record
Col!C-I &ength of o#@ect code in this record in #ytes
Col 17-8I #@ect code2 represented in he(adecimal (6 columns per #yte of o#@ect
code)
End record:
Col!1 '
Col!6-D $ddress of first e(ecuta#le instruction in o#@ect program!
Functions of the two passes of assembler:
Pass 1 (Define symbols)
1! $ssign addresses to all statements in the program!
6! Sa0e the addresses assigned to all la#els for use in %ass 6!
5! %erform some processing of assem#ler directi0es!
Pass 2 (Assemble instructions and generate object programs)
1=
1! $ssem#le instructions (translating operation codes and loo*ing up addresses)!
6! +enerate data 0alues defined #y "UT'2E)D etc!
5! %erform processing of assem#ler directi0es not done in %ass 1!
<! Erite the o#@ect program and the assem#ly listing!
2.1.2. Assembler Algorithm and Data Structures
$ssem#ler uses two ma@or internal data structures>
1! Operation Code Table (OPTAB) : ,sed to loo*up mnemonic operation codes
and translate them into their machine language e-ui0alents!
6! Symbol Table (SYMTAB) : ,sed to store 0alues($ddresses) assigned to la#els!
Location Counter (LOCCTR) :
Paria#le used to help in the assignment of addresses!
It is initialiAed to the #eginning address specified in the ST$)T statement!
$fter each source statement is processed2 the length of the assem#led instruction
or data area is added to &CCT)!
Ehene0er a la#el is reached in the source program2 the current 0alue of &CCT)
gi0es the address to #e associated with that la#el!
Operation Code Table (OPTAB) :
Contains the mnemonic operation and its machine language e-ui0alent!
$lso contains information a#out instruction format and length!
In %ass 12 %T$" is used to loo*up and 0alidate operation codes in the source
program!
In %ass 62 it is used to translate the operation codes to machine language program!
During %ass 62 the information in %T$" tells which instruction format to use in
assem#ling the instruction and any peculiarities of the o#@ect code instruction!
Symbol Table (SYMTAB) :
Includes the name and 0alue for each la#el in the source program and flags to
indicate error conditions!
During %ass 1 of the assem#ler2 la#els are entered into SUMT$" as they are
encountered in the source program along with their assigned addresses!
During %ass 62 sym#ols used as operands are loo*ed up in SUMT$" to o#tain the
addresses to #e inserted in the assem#led instructions!
%ass 1 usually writes an intermediate file that contains each source statement together
with its assigned address2 error indicators! This file is used as the input to %ass 6! This
copy of the source program can also #e used to retain the results of certain operations that
18
may #e performed during %ass 1 such as scanning the operand field for sym#ols and
addressing flags2 so these need not #e performed again during %ass 6!
2.2. MACHINE DEPENDENT ASSEMBLER FEATURES
Consider the design and implementation of an assem#ler for SIC/B' 0ersion!
1D
Indirect addressing is indicated #y adding the prefi( S to the operand (lineD7)!
Immediate operands are denoted with the prefi( T (lines 6=2 ==2155)! Instructions that
refer to memory are normally assem#led using either the program counter relati0e or #ase
counter relati0e mode!
The assem#ler directi0e "$S' (line 15) is used in con@unction with #ase relati0e
addressing! The four #yte e(tended instruction format is specified with the prefi( G added
to the operation code in the source statement!
)egister-to-register instructions are used where0er possi#le! ;or e(ample the statement
on line 1=7 is changed from CM% V') to CM%) $2S! Immediate and indirect
addressing ha0e also #een used as much as possi#le!
)egister-to-register instructions are faster than the corresponding register-to-memory
operations #ecause they are shorter and do not re-uire another memory reference!
Ehile using immediate addressing2 the operand is already present as part of the
instruction and need not #e fetched from anywhere! The use of indirect addressing often
a0oids the need for another instruction!
1C
2.2.1 Instruction Formats and Addressing Modes
SIC/B'
o %C-relati0e or "ase-relati0e addressing> op m
o Indirect addressing> op Sm
o Immediate addressing> op Tc
o '(tended format> Gop m
o Inde( addressing> op m2(
o register-to-register instructions
o larger memory -Q multi-programming (program allocation)
Translation
)egister translation
o register name ($2 B2 &2 "2 S2 T2 ;2 %C2 SE) and their 0alues (7212 62 52 <2
=2 82 C2 I)
o preloaded in SUMT$"
$ddress translation
o Most register-memory instructions use program counter relati0e or #ase
relati0e addressing
o ;ormat 5> 16-#it address field
#ase-relati0e> 7W<7I=
pc-relati0e> -67<CW67<D
o ;ormat <> 67-#it address field
2.2.2 Program Relocation
The need for program relocation
It is desira#le to load and run se0eral programs at the same time!
The system must #e a#le to load programs into memory where0er there is room!
The e(act starting address of the program is not *nown until load time!
$#solute %rogram
%rogram with starting address specified at assem#ly time
The address may #e in0alid if the program is loaded into somewhere else!
'(ample>
1I
Example: Program Relocation
The only parts of the program that re-uire modification at load time are those that
specify direct addresses!
The rest of the instructions need not #e modified!
o /ot a memory address (immediate addressing)
o %C-relati0e2 "ase-relati0e
;rom the o#@ect program2 it is not possi#le to distinguish the address and constant!
o The assem#ler must *eep some information to tell the loader!
o The o#@ect program that contains the modification record is called a
relocata#le program!
The way to sol0e the relocation pro#lem
;or an address la#el2 its address is assigned relati0e to the start of the
program(ST$)T 7)
%roduce a Modification record to store the starting location and the length of the
address
field to #e modified!
67
The command for the loader must also #e a part of the o#@ect program!
Modification record
ne modification record for each address to #e modified
The length is stored in half-#ytes (< #its)
The starting location is the location of the #yte containing the leftmost #its of the
address field to #e modified!
If the field contains an odd num#er of half-#ytes2 the starting location #egins in
the middle of the first #yte!
Relocatable Object Program
2.3. MACHINE INDEPENDENT ASSEMBLER FEATURES
2.3.1 Literals
The programmer writes the 0alue of a constant operand as a part of the instruction
that uses it! This a0oids ha0ing to define the constant elsewhere in the program
and ma*e a la#el for it!
Such an operand is called a &iteral #ecause the 0alue is literally in the instruction!
61
Consider the following e(ample
It is con0enient to write the 0alue of a constant operand as a part of instruction!
$ literal is identified with the prefi( F2 followed #y a specification of the literal
0alue!
'(ample>
Literals vs. Immediate Operands
&iterals
The assem#ler generates the specified 0alue as a constant at some other memory
location!
Immediate perands
66
The operand 0alue is assem#led as part of the machine instruction
Ee can ha0e literals in SIC2 #ut immediate operand is only 0alid in SIC/B'!
Literal Pools
/ormally literals are placed into a pool at the end of the program
In some cases2 it is desira#le to place literals into a pool at some other location in
the o#@ect program
$ssem#ler directi0e &T)+
o Ehen the assem#ler encounters a &T)+ statement2 it generates a literal
pool (containing all literal operands used since pre0ious &T)+)
)eason> *eep the literal operand close to the instruction
o therwise %C-relati0e addressing may not #e allowed
Duplicate literals
The same literal used more than once in the program
o nly one copy of the specified 0alue needs to #e stored
o ;or e(ample2 FB?7=?
Inorder to recogniAe the duplicate literals
o Compare the character strings defining them
'asier to implement2 #ut has potential pro#lem
e!g! FB?7=?
o Compare the generated data 0alue
"etter2 #ut will increase the comple(ity of the
assem#ler
e!g! C`EOF` and X`454F46`
Problem of duplicate-literal recognition
XK? denotes a literal refer to the current 0alue of program counter
o ",;'/D 'H, K
There may #e some literals that ha0e the same name2 #ut different 0alues
o "$S' K
o &D" FK (T&'/+T9)
The literal FK repeatedly used in the program has the same name2 #ut different
0alues
The literal 3FK4 represents an 3address4 in the program2 so the assem#ler must
generate the appropriate 3Modification records4!
Literal table - LITTAB
65
Content
o &iteral name
o perand 0alue and length
o $ddress
&ITT$" is often organiAed as a hash ta#le2 using the literal name or 0alue as the
*ey!
Implementation of Literals
Pass 1
"uild &ITT$" with literal name2 operand 0alue and length2 lea0ing the address
unassigned
Ehen &T)+ or '/D statement is encountered2 assign an address to each literal
not yet assigned an address
o updated to reflect the num#er of #ytes occupied #y each literal
Pass 2
Search &ITT$" for each literal operand encountered
+enerate data 0alues using "UT' or E)D statements
+enerate Modification record for literals that represent an address in the program
SYMTAB & LITTAB
2.3.2 Symbol-Defining Statements
6<
Most assem#lers pro0ide an assem#ler directi0e that allows the programmer to
define sym#ols and specify their 0alues!
$ssem#ler directi0e used is EQU.
Synta(> sym#ol 'H, 0alue
,sed to impro0e the program reada#ility2 a0oid using magic num#ers2 ma*e it
easier to find and change constant 0alues
)eplace G&DT T<7I8 with
M$B&'/ 'H, <7I8
G&DT TM$B&'/
Define mnemonic names for registers!
$ 'H, 7 )M $2B
B 'H, 1
'(pression is allowed
M$B&'/ 'H, ",;'/D-",;;')
$ssem#ler directi0e )+
$llow the assem#ler to reset the %C to 0alues
o Synta(> )+ 0alue
Ehen )+ is encountered2 the assem#ler resets its &CCT) to the specified
0alue!
)+ will affect the 0alues of all la#els defined until the ne(t )+!
If the pre0ious 0alue of &CCT) can #e automatically remem#ered2 we can
return to the normal use of &CCT) #y simply writing
o )+
Example: using ORG
If )+ statements are used
Ee can fetch the P$&,' field #y
&D$ P$&,'2B
B F 72 112 662 Y for each entry
Forward-Reference Problem
6=
;orward reference is not allowed for either 'H, or )+!
$ll terms in the 0alue field must ha0e #een defined pre0iously in the program!
The reason is that all sym#ols must ha0e #een defined during %ass 1 in a two-pass
assem#ler!
$llowed>
$&%9$ )'SE 1
"'T$ 'H, $&%9$
/ot $llowed>
"'T$ 'H, $&%9$
$&%9$ )'SE 1
2.3.3 Expressions
The assem#lers allow 3the use of e(pressions as operand4
The assem#ler e0aluates the e(pressions and produces a single operand address or
0alue!
'(pressions consist of
perator
o G2-2K2/ (di0ision is usually defined to produce an integer result)
Indi0idual terms
o Constants
o ,ser-defined sym#ols
o Special terms2 e!g!2 K2 the current 0alue of &CCT)
'(amples
M$B&'/ 'H, ",;'/D-",;;')
ST$" )'S" (8G5G6)KM$B'/T)I'S
Relocation Problem in Expressions
Palues of terms can #e
o $#solute (independent of program location)
constants
o )elati0e (to the #eginning of the program)
$ddress la#els
K (0alue of &CCT))
'(pressions can #e
$#solute
o nly a#solute terms!
o M$B&'/ 'H, 1777
)elati0e terms in pairs with opposite signs for each pair!
M$B&'/ 'H, ",;'/D-",;;')
)elati0e
68
$ll the relati0e terms e(cept one can #e paired as descri#ed in 3a#solute4!
The remaining unpaired relati0e term must ha0e a positi0e sign!
ST$" 'H, %T$" G (",;'/D ",;;'))
Restriction of Relative Expressions
/o relati0e terms may enter into a multiplication or di0ision operation
o 5 K ",;;')
'(pressions that do not meet the conditions of either 3a#solute4 or 3relati0e4
should #e flagged as errors!
o ",;'/D G ",;;')
o 177 ",;;')
Handling Relative Symbols in SYMTAB
To determine the type of an e(pression2 we must *eep trac* of the types of all
sym#ols defined in the program!
Ee need a 3flag4 in the SUMT$" for indication!
2.3.4 Program Blocks
$llow the generated machine instructions and data to appear in the o#@ect
program in a different order
Separating #loc*s for storing code2 data2 stac*2 and larger data #loc*
%rogram #loc*s 0ersus! Control sections
o %rogram #loc*s
Segments of code that are rearranged within a single o#@ect
program unit!
o Control sections
Segments of code that are translated into independent o#@ect
program units!
$ssem#ler rearranges these segments to gather together the pieces of each #loc*
and assign address!
Separate the program into #loc*s in a particular order
6D
&arge #uffer area is mo0ed to the end of the o#@ect program
%rogram reada#ility is #etter if data areas are placed in the source program close
to the statements that reference them!
Assembler directive: USE
,S' Z#loc*name[
$t the #eginning2 statements are assumed to #e part of the unnamed (default)
#loc*
If no ,S' statements are included2 the entire program #elongs to this single #loc*
'ach program #loc* may actually contain se0eral separate segments of the source
program
Example
6C
Three blocks are used
default> e(ecuta#le instructions!
CD$T$> all data areas that are less in length!
C"&.S> all data areas that consists of larger #loc*s of memory!
6I
Rearrange Codes into Program Blocks
%ass 1
$ separate location counter for each program #loc*
o Sa0e and restore &CCT) when switching #etween #loc*s
o $t the #eginning of a #loc*2 &CCT) is set to 7!
$ssign each la#el an address relati0e to the start of the #loc*
Store the #loc* name or num#er in the SUMT$" along with the assigned relati0e
address of the la#el
Indicate the #loc* length as the latest 0alue of &CCT) for each #loc* at the end
of %ass1
$ssign to each #loc* a starting address in the o#@ect program #y concatenating the
program #loc*s in a particular order
%ass 6
Calculate the address for each sym#ol relati0e to the start of the o#@ect program
#y adding
o The location of the sym#ol relati0e to the start of its #loc*
o The starting address of this #loc*
Program Blocks Loaded in Memory
57
Object Program
It is not necessary to physically rearrange the generated code in the o#@ect
program
The assem#ler @ust simply inserts the proper load address in each Te(t record!
The loader will load these codes into correct place
2.3.5 Control Sections and Program Linking
Control sections
can #e loaded and relocated independently of the other
are most often used for su#routines or other logical su#di0isions of a program
the programmer can assem#le2 load2 and manipulate each of these control sections
separately
#ecause of this2 there should #e some means for lin*ing control sections together
assem#ler directi0e> CS'CT
secname CS'CT
separate location counter for each control section
External Definition and Reference
Instructions in one control section may need to refer to instructions or data located
in another section
'(ternal definition
o 'BTD'; name Z2 name[
o 'BTD'; names sym#ols that are defined in this control section and may
#e used #y other sections
o '(> 'BTD'; ",;;')2 ",;'/D2 &'/+T9
'(ternal reference
o 'BT)'; name Z2name[
o 'BT)'; names sym#ols that are used in this control section and are
defined elsewhere
o '(> 'BT)'; )D)'C2 E))'C
To reference an e(ternal sym#ol2 e(tended format instruction is needed!
51
56
External Reference Handling
Case 1
1= 7775 C&% G:S," )D)'C <"177777
The operand )D)'C is an e(ternal reference!
The assem#ler
o 9as no idea where )D)'C is
o Inserts an address of Aero
o Can only use e(tended format to pro0ide enough room (that is2 relati0e
addressing for e(ternal reference is in0alid)
The assem#ler generates information for each e(ternal reference that will allow
the loader to perform the re-uired lin*ing!
Case 6
1I7 776C M$B&'/ E)D ",;'/D-",;;')
777777
There are two e(ternal references in the e(pression2 ",;'/D and ",;;')!
The assem#ler
o inserts a 0alue of Aero
o passes information to the loader
$dd to this data area the address of ",;'/D
Su#tract from this data area the address of ",;;')
Case 5
n line 17D2 ",;'/D and ",;;') are defined in the same control section and
the e(pression can #e calculated immediately!
17D 1777 M$B&'/ 'H, ",;'/D-",;;')
55
Records for Object Program
The assem#ler must include information in the o#@ect program that will cause the
loader to insert proper 0alues where they are re-uired!
Define record ('BTD';)
Col! 1 D
Col! 6-D /ame of e(ternal sym#ol defined in this control section
Col! C-15 )elati0e address within this control section (he(adeccimal)
Col!1<-D5 )epeat information in Col! 6-15 for other e(ternal sym#ols
)efer record ('BT)';)
Col! 1 )
Col! 6-D /ame of e(ternal sym#ol referred to in this control section
Col! C-D5 /ame of other e(ternal reference sym#ols
Modification record
Col! 1 M
Col! 6-D Starting address of the field to #e modified (he(iadecimal)
Col! C-I &ength of the field to #e modified2 in half-#ytes (he(adeccimal)
Col!11-18 '(ternal sym#ol whose 0alue is to #e added to or su#tracted from the
indicated field
Control section name is automatically an e(ternal sym#ol2 i!e! it is a0aila#le for
use in Modification records!
Object Program
5<
Expressions in Multiple Control Sections
'(tended restriction
o "oth terms in each pair of an e(pression must #e within the same control
section
o &egal> ",;'/D-",;;')
o Illegal> )D)'C-C%U
9ow to enforce this restriction
o Ehen an e(pression in0ol0es e(ternal references2 the assem#ler cannot
determine whether or not the e(pression is legal!
o The assem#ler e0aluates all of the terms it can2 com#ines these to form an
initial e(pression 0alue2 and generates Modification records!
o The loader chec*s the e(pression for errors and finishes the e0aluation!
2.4. ASSEMBLER DESIGN
The assem#ler design deals with
Two-pass assem#ler with o0erlay structure
ne-pass assem#lers
Multi-pass assem#lers
6!<!1 One-pass assembler
Load-and-Go Assembler
5=
&oad-and-go assem#ler generates their o#@ect code in memory for immediate
e(ecution!
/o o#@ect program is written out2 no loader is needed!
It is useful in a system with fre-uent program de0elopment and testing
The efficiency of the assem#ly process is an important consideration!
%rograms are re-assem#led nearly e0ery time they are runR efficiency of the
assem#ly process is an important consideration!
One-Pass Assemblers
Scenario for one-pass assem#lers
o +enerate their o#@ect code in memory for immediate e(ecution load-
and-go assem#ler
o '(ternal storage for the intermediate file #etween two passes is slow or is
incon0enient to use
Main pro#lem - ;orward references
o Data items
o &a#els on instructions
Solution
o )e-uire that all areas #e defined #efore they are referenced!
o It is possi#le2 although incon0enient2 to do so for data items!
o ;orward @ump to instruction items cannot #e easily eliminated!
Insert (la#el2 address_to_be_modified) to SUMT$"
,sually2 address_to_be_modified is stored in a lin*ed-list
Sample program for a one-pass assembler
58
Forward Reference in One-pass Assembler
mits the operand address if the sym#ol has not yet #een defined!
'nters this undefined sym#ol into SUMT$" and indicates that it is undefined!
$dds the address of this operand address to a list of forward references associated
with the SUMT$" entry!
Ehen the definition for the sym#ol is encountered2 scans the reference list and
inserts the address!
$t the end of the program2 reports the error if there are still SUMT$" entries
indicated undefined sym#ols!
;or &oad-and-+o assem#ler
o Search SUMT$" for the sym#ol named in the '/D statement and @umps
to this location to #egin e(ecution if there is no error!
Object Code in Memory and SYMTAB
5D
$fter scanning line <7 of the a#o0e program
$fter scanning line 187 of the a#o0e program
If One-Pass Assemblers need to produce object codes
5C
If the operand contains an undefined sym#ol2 use 7 as the address and write the
Te(t record to the o#@ect program!
;orward references are entered into lists as in the load-and-go assem#ler!
Ehen the definition of a sym#ol is encountered2 the assem#ler generates another
Te(t record with the correct operand address of each entry in the reference list!
Ehen loaded2 the incorrect address 7 will #e updated #y the latter Te(t record
containing the sym#ol definition!
Object code generated by one-pass assembler
2.4.2 Two-pass assembler with overlay structure
Most assem#lers di0ide the processing of the source program into two passes!
The internal ta#les and su#routines that are used only during %ass 1 are no longer
needed after the first pass is completed!
The routines and ta#les for %ass 1 and %ass 6 are ne0er re-uired at the same time!
There are certain ta#les (SUMT$") and certain processing su#routines (searching
SUMT$") that are used #y #oth passes!
Since %ass 1 and %ass 6 segments are ne0er needed at the same time2 they can
occupy the same locations in memory during e(ecution of the assem#ler!
Initially the )oot and %ass 1 segments are loaded into memory!
The assem#ler then ma*es the first pass o0er the program #eing assem#led!
$t the end of the %ass12 the %ass 6 segment is loaded2 replacing the %ass 1
segment!
The assem#ler then ma*es its second pass of the source program and terminates!
5I
The assem#ler needs much less memory to run in this way than it would #e if #oth
%ass 1 and %ass 6 were loaded at the same time!
$ program that is designed to e(ecute in this way is called an 0erlay program
#ecause some of its segments o0erlay others during e(ecution!
2.4.3 Multi-Pass Assemblers
;or a two pass assem#ler2 forward references in sym#ol definition are not
allowed>
$&%9$ 'H, "'T$
"'T$ 'H, D'&T$
D'&T$ )'SE 1
The sym#ol "'T$ cannot #e assigned a 0alue when it is encountered during %ass
1 #ecause D'&T$ has not yet #een defined!
9ence $&%9$ cannot #e e0aluated during %ass 6!
Sym#ol definition must #e completed in pass 1!
%rohi#iting forward references in sym#ol definition is not a serious
incon0enience!
;orward references tend to create difficulty for a person reading the program!
The general solution for forward references is a multi-pass assem#ler that can
ma*e as many passes as are needed to process the definitions of sym#ols!
It is not necessary for such an assem#ler to ma*e more than 6 passes o0er the
entire program!
The portions of the program that in0ol0e forward references in sym#ol definition
are sa0ed during %ass 1!
$dditional passes through these stored definitions are made as the assem#ly
progresses!
This process is followed #y a normal %ass 6!
<7
Implementation
;or a forward reference in sym#ol definition2 we store in the SUMT$">
o The sym#ol name
o The defining e(pression
o The num#er of undefined sym#ols in the defining e(pression
The undefined sym#ol (mar*ed with a flag K) associated with a list of sym#ols
depend on this undefined sym#ol!
Ehen a sym#ol is defined2 we can recursi0ely e0aluate the sym#ol e(pressions
depending on the newly defined sym#ol!
Example of Multi-pass assembler
Consider the sym#ol ta#le entries from %ass 1 processing of the statement!
9$&;S6 'H, M$B&'//6
Since M$B&'/ has not yet #een defined2 no 0alue for 9$&;S6 can #e
computed!
The defining e(pression for 9$&;S6 is stored in the sym#ol ta#le in place of its
0alue!
The entry N1 indicates that 1 sym#ol in the defining e(pression undefined!
SUMT$" simply contain a pointer to the defining e(pression!
<1
The sym#ol M$B&'/ is also entered in the sym#ol ta#le2 with the flag K
identifying it as undefined!
$ssociated with this entry is a list of the sym#ols whose 0alues depend on
M$B&'/!
<6
UNIT III
LOADERS AND LINKERS
INTRODUCTION
&oader is a system program that performs the loading function!
Many loaders also support relocation and lin*ing!
Some systems ha0e a lin*er (lin*age editor) to perform the lin*ing operations and
a separate loader to handle relocation and loading!
ne system loader or lin*er can #e used regardless of the original source
programming language!
&oading "rings the o#@ect program into memory for e(ecution!
)elocation Modifies the o#@ect program so that it can #e loaded at an address
different from the location originally specified!
&in*ing Com#ines two or more separate o#@ect programs and supplies the
information needed to allow references #etween them!
3.1 BASIC LOADER FUNCTIONS
;undamental functions of a loader>
1! "ringing an o#@ect program into memory!
6! Starting its e(ecution!
3.1.1 Design of an Absolute Loader
;or a simple a#solute loader2 all functions are accomplished in a single pass as follows>
1) The 9eader record of o#@ect programs is chec*ed to 0erify that the correct program has
#een presented for loading!
6) $s each Te(t record is read2 the o#@ect code it contains is mo0ed to the indicated
address in memory!
5) Ehen the 'nd record is encountered2 the loader @umps to the specified address to #egin
e(ecution of the loaded program!
<5
An example object program is shown in Fig (a).
Fig (b) shows a representation of the program from Fig (a) after loading.
<<
Algorithm for Absolute Loader
It is 0ery important to realiAe that in ;ig (a)2 each printed character represents one
#yte of the o#@ect program record!
In ;ig (#)2 on the other hand2 each printed character represents one he(adecimal
digit in memory (a half-#yte)!
Therefore2 to sa0e space and e(ecution time of loaders2 most machines store
o#@ect programs in a binary form2 with each #yte of o#@ect code stored as a single
#yte in the o#@ect program!
In this type of representation a #yte may contain any #inary 0alue!
3.1.2 A Simple Bootstrap Loader
Ehen a computer is first turned on or restarted2 a special type of a#solute loader2 called a
bootstrap loader2 is e(ecuted! This #ootstrap loads the first program to #e run #y the
computer usually an operating system!
Working of a simple Bootstrap loader
The #ootstrap #egins at address 7 in the memory of the machine!
It loads the operating system at address C7!
'ach #yte of o#@ect code to #e loaded is represented on de0ice ;1 as two
hexadecimal digits @ust as it is in a Te(t record of a SIC o#@ect program!
<=
The o#@ect code from de0ice ;1 is always loaded into consecuti0e #ytes of
memory2 starting at address C7! The main loop of the #ootstrap *eeps the address
of the ne(t memory location to #e loaded in register B!
$fter all of the o#@ect code from de0ice ;1 has #een loaded2 the #ootstrap @umps
to address C72 which #egins the e(ecution of the program that was loaded!
Much of the wor* of the #ootstrap loader is performed #y the su#routine +'TC!
+'TC is used to read and con0ert a pair of characters from de0ice ;1 representing
1 #yte of o#@ect code to #e loaded! ;or e(ample2 two #ytes F C 3DC4 X<<5C?9
con0erting to one #yte XDC?9!
The resulting #yte is stored at the address currently in register B2 using STC9
instruction that refers to location 7 using inde(ed addressing!
The TIB) instruction is then used to add 1 to the 0alue in B!
Source code for bootstrap loader
<8
3.2 MACHINE-DEPENDENT LOADER FEATURES
The a#solute loader has se0eral potential disad0antages! ne of the most o#0ious
is the need for the programmer to specify the actual address at which it will #e
loaded into memory!
n a simple computer with a small memory the actual address at which the
program will #e loaded can #e specified easily!
n a larger and more ad0anced machine2 we often li*e to run se0eral independent
programs together2 sharing memory #etween them! Ee do not *now in ad0ance
where a program will #e loaded! 9ence we write relocata#le programs instead of
a#solute ones!
Eriting a#solute programs also ma*es it difficult to use su#routine li#raries
efficiently! This could not #e done effecti0ely if all of the su#routines had pre-
assigned a#solute addresses!
The need for program relocation is an indirect conse-uence of the change to
larger and more powerful computers! The way relocation is implemented in a
loader is also dependent upon machine characteristics!
&oaders that allow for program relocation are called relocating loaders or relati0e
loaders!
3.2.1 Relocation
Two methods for specifying relocation as part of the object program:
The first method :
$ Modification is used to descri#e each part of the o#@ect code that must #e
changed when the program is relocated!
<D
Fig(1) :Consider the program
<C
Most of the instructions in this program use relati0e or immediate addressing!
The only portions of the assem#led program that contain actual addresses are the
e(tended format instructions on lines 1=2 5=2 and 8=! Thus these are the only items
whose 0alues are affected #y relocation!
Object program
'ach Modification record specifies the starting address and length of the field
whose 0alue is to #e altered!
It then descri#es the modification to #e performed!
In this e(ample2 all modifications add the 0alue of the sym#ol C%U2 which
represents the starting address of the program!
Fig(2) :Consider a Relocatable program for a Standard SIC machine
<I
.
.
.
The Modification record is not well suited for use with all machine
architectures!Consider2 for e(ample2 the program in ;ig (6) !This is a relocata#le
program written for standard 0ersion for SIC!
The important difference #etween this e(ample and the one in ;ig (1) is that the
standard SIC machine does not use relati0e addressing!
In this program the addresses in all the instructions e(cept )S," must modified
when the program is relocated! This would re-uire 51 Modification records2
which results in an o#@ect program more than twice as large as the one in ;ig (1)!
The second method :
There are no Modification records!
The Te(t records are the same as #efore e(cept that there is a relocation bit
associated with each word of o#@ect code!
Since all SIC instructions occupy one word2 this means that there is one relocation
#it for each possi#le instruction!
Fig (3): Object program with relocation by bit mask
=7
The relocation #its are gathered together into a bit mask following the length
indicator in each Te(t record! In ;ig (5) this mas* is represented (in character
form) as three he(adecimal digits!
If the relocation #it corresponding to a word of o#@ect code is set to 12 the
program?s starting address is to #e added to this word when the program is
relocated! $ #it 0alue of 0 indicates that no modification is necessary!
If a Te(t record contains fewer than 16 words of o#@ect code2 the #its
corresponding to unused words are set to 7!
;or e(ample2 the #it mas* ;;C (representing the #it string 111111111177) in the
first Te(t record specifies that all 17 words of o#@ect code are to #e modified
during relocation!
Example: /ote that the &DB instruction on line 617 (;ig (6)) #egins a new Te(t
record! If it were placed in the preceding Te(t record2 it would not #e properly
aligned to correspond to a relocation #it #ecause of the 1-#yte data 0alue
generated from line 1C=!
3.2.2 Program Linking
Consider the three (separately assem#led) programs in the figure2 each of which consists
of a single control section!
Program 1 (PROGA):
=1
Program 2 (PROGB):
Program 3 (PROGC):
=6
Consider first the reference marked REF1.
;or the first program (%)+$)2
)';1 is simply a reference to a la#el within the program!
It is assem#led in the usual way as a %C relati0e instruction!
/o modification for relocation or lin*ing is necessary!
In %)+"2 the same operand refers to an e(ternal sym#ol!
The assem#ler uses an e(tended-format instruction with address field set to
77777!
The o#@ect program for %)+" contains a Modification record instructing the
loader to add the value of the symbol LIST to this address field when the
program is lin*ed!
;or %)+C2 )';1 is handled in e(actly the same way!
Corresponding object programs
PROGA:
=5
PROGB:
PROGC:
=<
The reference mar*ed )';6 is processed in a similar manner!
)';5 is an immediate operand whose 0alue is to #e the difference #etween
'/D$ and &IST$ (that is2 the length of the list in #ytes)!
In %)+$2 the assem#ler has all of the information necessary to compute this
0alue! During the assem#ly of %)+" (and %)+C)2 the 0alues of the la#els are
un*nown!
In these programs2 the e(pression must #e assem#led as an e(ternal reference
(with two !odification records) e0en though the final result will #e an a#solute
0alue independent of the locations at which the programs are loaded!
Consider REF4.
The assem#ler for %)+$ can e0aluate all of the e(pression in )';< e(cept for
the 0alue of &ISTC! This results in an initial 0alue of X77771<?9 and one
Modification record!
The same e(pression in %)+" contains no terms that can #e e0aluated #y the
assem#ler! The o#@ect code therefore contains an initial 0alue of 777777 and three
Modification records!
;or %)+C2 the assem#ler can supply the 0alue of &ISTC relati0e to the
#eginning of the program (#ut not the actual address2 which is not *nown until the
program is loaded)!
The initial 0alue of this data word contains the relati0e address of &ISTC
(X777757?9)! Modification records instruct the loader to add the #eginning
address of the program (i!e!2 the 0alue of %)+C)2 to add the 0alue of '/D$2
and to su#tract the 0alue of &IST$!
Fig (4): The three programs as they might appear in memory after loading and
linking.
==
%)+$ has #een loaded starting at address <7772 with %)+" and %)+C
immediately following!
;or e(ample2 the 0alue for reference )';< in %)+$ is located at address <7=< (the
#eginning address of %)+$ plus 77=<)!
Fig (5): Relocation and linking operations performed on REF4 in PROGA
=8
The initial 0alue (from the Te(t record) is 77771<! To this is added the address assigned
to &ISTC2 which <116 (the #eginning address of %)+C plus 57)!
3.2.3 Algorithm and Data Structures for a Linking Loader
The algorithm for a lin"ing loader is considera#ly more complicated than the
absolute loader algorithm!
$ lin*ing loader usually ma*es two passes o0er its input2 @ust as an assem#ler
does! In terms of general function2 the two passes of a lin*ing loader are -uite
similar to the two passes of an assem#ler>
%ass 1 assigns addresses to all e(ternal sym#ols!
%ass 6 performs the actual loading2 relocation2 and lin*ing!
The main data structure needed for our lin*ing loader is an external symbol table
ESTAB.
(1) This ta#le2 which is analogous to SUMT$" in our assem#ler algorithm2 is
used to store the name and address of each e(ternal sym#ol in the set of
control sections #eing loaded!
=D
(6) $ hashed organi#ation is typically used for this ta#le!
Two other important 0aria#les are PROGADDR (program load address) and
CSADDR (control section address).
(1) %)+$DD) is the beginning address in memory where the lin*ed program
is to #e loaded! Its 0alue is supplied to the loader #y the S!
(6) CS$DD) contains the starting address assigned to the control section
currently #eing scanned #y the loader! This 0alue is added to all relati0e
addresses within the control section to con0ert them to actual addresses!
3.2.3.1 PASS 1
During %ass 12 the loader is concerned only with 9eader and Define record types
in the control sections!
Algorithm for Pass 1 of a Linking loader
1) The #eginning load address for the lin*ed program (%)+$DD)) is o#tained from
the S! This #ecomes the starting address (CS$DD)) for the first control section in the
input se-uence!
6) The control section name from 9eader record is entered into 'ST$"2 with 0alue gi0en
#y CS$DD)! $ll external symbols appearing in the Define record for the control
=C
section are also entered into 'ST$"! Their addresses are o#tained #y adding the 0alue
specified in the Define record to CS$DD)!
5) Ehen the 'nd record is read2 the control section length CS&T9 (which was sa0ed
from the 'nd record) is added to CS$DD)! This calculation gi0es the starting address for
the ne(t control section in se-uence!
$t the end of %ass 12 'ST$" contains all e(ternal sym#ols defined in the set of
control sections together with the address assigned to each!
Many loaders include as an option the a#ility to print a load map that shows these
sym#ols and their addresses!
3.2.3.2 PASS 2
%ass 6 performs the actual loading2 relocation2 and lin"ing of the program!
Algorithm for Pass 2 of a Linking loader
1) $s each Te(t record is read2 the o#@ect code is mo0ed to the specified address (plus the
current 0alue of CS$DD))!
6) Ehen a Modification record is encountered2 the sym#ol whose 0alue is to #e used for
modification is loo*ed up in 'ST$"!
5) This 0alue is then added to or su#tracted from the indicated location in memory!
<) The last step performed #y the loader is usually the transferring of control to the
loaded program to #egin e(ecution!
The 'nd record for each control section may contain the address of the first
instruction in that control section to #e e(ecuted! ur loader ta*es this as the
transfer point to #egin e(ecution! If more than one control section specifies a
transfer address2 the loader ar#itrarily uses the last one encountered!
If no control section contains a transfer address2 the loader uses the #eginning of
the lin*ed program (i!e!2 %)+$DD)) as the transfer point!
/ormally2 a transfer address would #e placed in the 'nd record for a main
program2 #ut not for a su#routine!
=I
This algorithm can #e made more efficient! $ssign a reference num#er2 which is used
(instead of the sym#ol name) in Modification records2 to each e(ternal sym#ol referred to
in a control section! Suppose we always assign the reference num#er 71 to the control
section name!
Fig (6): Object programs using reference numbers for code modification
87
81
3.3 MACHINE-INDEPENDENT LOADER FEATURES
&oading and lin*ing are often thought of as S ser0ice functions! Therefore2 most
loaders include fewer different features than are found in a typical assem#ler!
They include the use of an automatic li#rary search process for handling e(ternal
reference and some common options that can #e selected at the time of loading
and lin*ing!
3.3.1 Automatic Library Search
Many lin*ing loaders can automatically incorporate routines from a su#program
li#rary into the program #eing loaded!
&in*ing loaders that support automatic library search must *eep trac* of e(ternal
sym#ols that are referred to2 #ut not defined2 in the primary input to the loader!
$t the end of %ass 12 the sym#ols in 'ST$" that remain undefined represent
unresol0ed e(ternal references!
The loader searches the li#rary or li#raries specified for routines that contain the
definitions of these sym#ols2 and processes the su#routines found #y this search
e(actly as if they had #een part of the primary input stream!
The su#routines fetched from a li#rary in this way may themsel0es contain
e(ternal references! It is therefore necessary to repeat the li#rary search process
until all references are resol0ed!
If unresol0ed e(ternal references remain after the li#rary search is completed2
these must #e treated as errors!
3.3.2 Loader Options
Many loaders allow the user to specify options that modify the standard
processing
Typical loader option 1: $llows the selection of alternati0e sources of input!
Ex : I/C&,D' program-name (li#rary-name) might direct the loader to read the
designated o#@ect program from a li#rary and treat it as if it were part of the
primary loader input!
Loader option 2: $llows the user to delete e(ternal sym#ols or entire control
sections!
Ex : D'&'T' csect-name might instruct the loader to delete the named control
section(s) from the set of programs #eing loaded!
C9$/+' name12 name6 might cause the e(ternal sym#ol name1 to #e changed
to name6 where0er it appears in the o#@ect programs!
86
Loader option 3: In0ol0es the automatic inclusion of li#rary routines to satisfy
e(ternal references!
Ex. : &I")$)U MU&I"
Such user-specified li#raries are normally searched #efore the standard system
li#raries! This allows the user to use special 0ersions of the standard routines!
/C$&& STDD'P2 %&T2 C))'&
To instruct the loader that these e(ternal references are to remain unresol0ed! This
a0oids the o0erhead of loading and lin*ing the unneeded routines2 and sa0es the
memory space that would otherwise #e re-uired!
3.4 LOADER DESIGN OPTIONS
&in*ing loaders perform all lin*ing and relocation at load time!
There are two alternati0es>
1! Linkage editors2 which perform lin*ing prior to load time!
6! Dynamic linking2 in which the lin*ing function is performed at e(ecution
time!
%recondition> The source program is first assem#led or compiled2 producing an
o#@ect program!
$ linking loader performs all lin*ing and relocation operations2 including
automatic li#rary search if specified2 and loads the lin*ed program directly into
memory for e(ecution!
$ linkage editor produces a lin*ed 0ersion of the program (load module or
e(ecuta#le image)2 which is written to a file or li#rary for later e(ecution!
3.4.1 Linkage Editors
The lin*age editor performs relocation of all control sections relati0e to the start
of the lin*ed program! Thus2 all items that need to #e modified at load time ha0e
0alues that are relati0e to the start of the lin*ed program!
This means that the loading can #e accomplished in one pass with no e(ternal
sym#ol ta#le re-uired!
If a program is to #e e(ecuted many times without #eing reassem#led2 the use of a
lin*age editor su#stantially reduces the o0erhead re-uired!
&in*age editors can perform many useful functions #esides simply preparing an
o#@ect program for e(ecution! '(!2 a typical se-uence of lin*age editor commands
used>
I/C&,D' %&$//') (%)+&I")
85
D'&'T' %):'CT delete from existing PLANNER]
I/C&,D' %):'CT (/'E&I") include new version]
)'%&$C' %&$//') (%)+&I")
&in*age editors can also #e used to #uild pac*ages of su#routines or other control
sections that are generally used together! This can #e useful when dealing with
su#routine li#raries that support high-le0el programming languages!
&in*age editors often include a 0ariety of other options and commands li*e those
discussed for lin*ing loaders! Compared to lin*ing loaders2 lin*age editors in
general tend to offer more fle(i#ility and control!
Fig (7): Processing of an object program using (a) Linking loader and (b) Linkage
editor
3.4.2 Dynamic Linking
8<
&in*age editors perform lin*ing operations #efore the program is loaded for
e(ecution!
&in*ing loaders perform these same operations at load time!
Dynamic lin*ing2 dynamic loading2 or load on call postpones the lin*ing function
until e(ecution time> a su#routine is loaded and lin*ed to the rest of the program
when it is first called!
Dynamic lin*ing is often used to allow se0eral e(ecuting programs to share one
copy of a su#routine or li#rary2 e(! run-time support routines for a high-le0el
language li*e C!
Eith a program that allows its user to interacti0ely call any of the su#routines of a
large mathematical and statistical li#rary2 all of the li#rary su#routines could
potentially #e needed2 #ut only a few will actually #e used in any one e(ecution!
Dynamic lin*ing can a0oid the necessity of loading the entire li#rary for each
e(ecution e(cept those necessary su#routines!
8=
Fig (a): Instead of e(ecuting a :S," instruction referring to an e(ternal sym#ol2 the
program ma*es a load-and-call ser0ice re-uest to S! The parameter of this re-uest is the
sym#olic name of the routine to #e called!
Fig (b): S e(amines its internal ta#les to determine whether or not the routine is already
loaded! If necessary2 the routine is loaded from the specified user or system li#raries!
Fig (c): Control is then passed from S to the routine #eing called
Fig (d): Ehen the called su#routine completes it processing2 it returns to its caller (i!e!2
S)! S then returns control to the program that issued the re-uest!
Fig (e): If a su#routine is still in memory2 a second call to it may not re-uire another load
operation! Control may simply #e passed from the dynamic loader to the called routine!
88
3.4.3 Bootstrap Loaders
Eith the machine empty and idle there is no need for program relocation!
Ee can specify the a#solute address for whate0er program is first loaded and this
will #e the S2 which occupies a predefined location in memory!
Ee need some means of accomplishing the functions of an a#solute loader!
1! To ha0e the operator enter into memory the o#@ect code for an a#solute loader2
using switches on the computer console!
6! To ha0e the a#solute loader program permanently resident in a )M!
5! To ha0e a #uilt in hardware function that reads a fi(ed length record from
some de0ice into memory at a fi(ed location!
Ehen some hardware signal occurs2 the machine #egins to e(ecute this )M
program!
n some computers2 the program is e(ecuted directly in the )M> on others2 the
program is copied from )M to main memory and e(ecuted there!
The particular de0ice to #e used can often #e selected 0ia console switches!
$fter the read operation is complete2 control is automatically transferred to the
address in memory where the record was stored2 which contains machine where
the record was stored2 which contains machine instructions that load the a#solute
program that follow!
If the loading process re-uires more instructions that can #e read in a single
record2 this first record causes the reading of others2 and these in turn can cause
the reading of still more records #oots trap!
The first record is generally referred to as #ootstrap loader>
Such a loader is added to the #eginning of all o#@ect programs that are to #e
loaded into an empty and idle system!
This includes the S itself and all stand-alone programs that are to #e run without
an S!
8D
UNIT IV
MACROPROCESSORS
INTRODUCTION
Macro Instructions
\ $ macro instruction (macro)
It is simply a notational con0enience for the programmer to write a
shorthand 0ersion of a program!
It represents a commonly used group of statements in the source program!
It is replaced #y the macro processor with the corresponding group of
source language statements! This operation is called 3e(panding the
macro4
\ ;or e(ample>
Suppose it is necessary to sa0e the contents of all registers #efore calling a
su#routine!
This re-uires a se-uence of instructions!
Ee can define and use a macro2 S$P')'+S2 to represent this se-uence
of instructions!
Macro Processor
\ $ macro processor
Its functions essentially in0ol0e the su#stitution of one group of characters
or lines for another!
/ormally2 it performs no analysis of the te(t it handles!
It doesn?t concern the meaning of the in0ol0ed statements during macro
e(pansion!
\ Therefore2 the design of a macro processor generally is machine independent!
\ Macro processors are used in
assem#ly language
high-le0el programming languages2 e!g!2 C or CGG
S command languages
general purpose
Format of macro definition
$ macro can #e defined as follows
M$C) - M$C) pseudo-op shows start of macro definition!
/ame Z&ist of %arameters[ Macro name with a list of formal parameters!
8C
YY!
YY!
YY! - Se-uence of assem#ly language instructions!
M'/D - M'/D (M$C)-'/D) %seudo shows the end of macro definition!
Example:
M$C)
S,M B2U
&D$ B
MP "B2B
&D$ U
$DD "B
M'/D
4.1 BASIC MACROPROCESSOR FUNCTIONS
The fundamental functions common to all macro processors are>
1! Macro Definition
6! Macro In0ocation
5! Macro '(pansion
Macro Definition and Expansion
Two new assem#ler directi0es are used in macro definition>
o M$C)> identify the #eginning of a macro definition
o M'/D> identify the end of a macro definition
%rototype for the macro>
o 'ach parameter #egins with XN?
la#el op operands
name M$C) parameters
>
body
>
M'/D
"ody> The statements that will #e generated as the e(pansion of the macro!
8I
D7
It shows an e(ample of a SIC/B' program using macro Instructions!
This program defines and uses two macro instructions2 )D",;; and E)D,;; !
The functions and logic of )D",;; macro are similar to those of the )D",;;
su#routine!
The E)",;; macro is similar to E))'C su#routine!
Two $ssem#ler directi0es (M$C) and M'/D) are used in macro definitions!
The first M$C) statement identifies the #eginning of macro definition!
The Sym#ol in the la#el field ()D",;;) is the name of macro2 and entries in the
operand field identify the parameters of macro instruction!
In our macro language2 each parameter #egins with character N2 which facilitates
the su#stitution of parameters during macro e(pansion!
The macro name and parameters define the pattern or prototype for the macro
instruction used #y the programmer! The macro instruction definition has #een
deleted since they ha0e #een no longer needed after macros are e(panded!
'ach macro in0ocation statement has #een e(panded into the statements that form
the #ody of the macro2 with the arguments from macro in0ocation su#stituted for
the parameters in macro prototype!
The arguments and parameters are associated with one another according to their
positions!
Macro Invocation
$ macro in0ocation statement (a macro call) gi0es the name of the macro
instruction #eing in0o*ed and the arguments in e(panding the macro!
The processes of macro in0ocation and su#routine call are -uite different!
o Statements of the macro #ody are e(panded each time the macro is
in0o*ed!
o Statements of the su#routine appear only oneR regardless of how many
times the su#routine is called!
The macro in0ocation statements treated as comments and the statements
generated from macro e(pansion will #e assem#led as though they had #een
written #y the programmer!
D1
Macro Expansion
'ach macro in0ocation statement will #e e(panded into the statements that form
the #ody of the macro!
$rguments from the macro in0ocation are su#stituted for the parameters in the
macro prototype!
o The arguments and parameters are associated with one another according
to their positions!
The first argument in the macro in0ocation corresponds to the first
parameter in the macro prototype2 etc!
Comment lines within the macro #ody ha0e #een deleted2 #ut comments on
indi0idual statements ha0e #een retained!
Macro in0ocation statement itself has #een included as a comment line!
'(ample of a macro e(pansion
D6
In e(panding the macro in0ocation on line 1I72 the argument ;1 is su#stituted for
the parameter and I/D'P where0er it occurs in the #ody of the macro!
Similarly ",;;') is su#stituted for ",;$D) and &'/+T9 is su#stituted for
)'C&T9!
&ines 1I7a through 1I7m show the complete e(pansion of the macro in0ocation
on line 1I7!
The la#el on the macro in0ocation statement C&% has #een retained as a la#el
on the first statement generated in the macro e(pansion!
This allows the programmer to use a macro instruction in e(actly the same way as
an assem#ler language mnemonic!
$fter macro processing the e(panded file can #e used as input to assem#ler!
The macro in0ocation statement will #e treated as comments and the statements
generated from the macro e(pansions will #e assem#led e(actly as though they
had #een written directly #y the programmer!
D5
4.1.1 Macro Processor Algorithm and Data Structures
It is easy to design a two-pass macro processor in which all macro definitions are
processed during the first pass 2and all macro in0ocation statements are e(panded
during second pass
Such a two pass macro processor would not allow the #ody of one macro
instruction to contain definitions of other macros!
Example 1:
Example 2:
D<
Defining M$C)S or M$C)B does not define )D",;; and the other macro
instructions! These definitions are processed only when an in0ocation of
M$C)S or M$C)B is e(panded!
$ one pass macroprocessor that can alternate #etween macro definition and macro
e(pansion is a#le to handle macros li*e these!
There are 5 main data structures in0ol0ed in our macro processor!
Definition table (DEFTAB)
1! The macro definition themsel0es are stored in definition ta#le (D';T$")2 which
contains the macro prototype and statements that ma*e up the macro #ody!
6! Comment lines from macro definition are not entered into D';T$" #ecause they
will not #e a part of macro e(pansion!
Name table (NAMTAB)
1! )eferences to macro instruction parameters are con0erted to a positional entered
into /$MT$"2 which ser0es the inde( to D';T$"!
6! ;or each macro instruction defined2 /$MT$" contains pointers to #eginning and
end of definition in D';T$"!
Argument table (ARGTAB)
1! The third Data Structure in an argument ta#le ($)+T$")2 which is used during
e(pansion of macro in0ocations!
6! Ehen macro in0ocation statements are recogniAed2 the arguments are stored in
$)+T$" according to their position in argument list!
5! $s the macro is e(panded2 arguments from $)+T$" are su#stituted for the
corresponding parameters in the macro #ody!
D=
The position notation is used for the parameters! The parameter NI/D'P has
#een con0erted to ]12 N",;$D) has #een con0erted to ]6!
Ehen the ]n notation is recogniAed in a line from D';T$"2 a simple inde(ing
operation supplies the property argument from $)+T$"!
Algorithm:
The procedure D';I/'2 which is called when the #eginning of a macro definition
is recogniAed2 ma*es the appropriate entries in D';T$" and /$MT$"!
'B%$/D is called to set up the argument 0alues in $)+T$" and e(pand a
macro in0ocation statement!
The procedure +'T&I/' gets the ne(t line to #e processed
This line may come from D';T$" or from the input file2 depending upon
whether the "oolean 0aria#le 'B%$/DI/+ is set to T),' or ;$&S'!
D8
4.2 MACHINE INDEPENDENT MACRO PROCESSOR FEATURES
Machine independent macro processor features are e(tended features that are not directly
related to architecture of computer for which the macro processor is written!
4.2.1 Concatenation of Macro Parameter
Most Macro %rocessor allows parameters to #e concatenated with other character
strings!
$ program contains a set of series of 0aria#les>
B$12 B$62 B$52Y
DD
B"12 B"62 B"52Y
If similar processing is to #e performed on each series of 0aria#les2 the
programmer might want to incorporate this processing into a macro instructuion!
The parameter to such a macro instruction could specify the series of 0aria#les to
#e operated on ($2 "2 C Y)!
The macro processor constructs the sym#ols #y concatenating B2 ($2 "2 Y)2 and
(126252Y) in the macro e(pansion!
Suppose such parameter is named NID2 the macro #ody may contain a statement>
&D$ BNID12 in which NID is concatenated after the string 3B4 and #efore the
string 314!
&D$ B$1 (NIDF$)
&D$ B"1 (NIDF")
$m#iguity pro#lem>
'!g!2 BNID1 may mean
3B4 G NID G 314
3B4 G NID1
This pro#lem occurs #ecause the end of the parameter is not mar*ed!
Solution to this am#iguity pro#lem>
,se a special concatenation operator 34 to specify the end of the parameter
&D$ BNID 1
So that the end of parameter NID is clearly identified!
Macro definition
Macro invocation statements
DC
The macroprocessor deletes all occurrences of the concatenation operator
immediately after performing parameter su#stitution2 so the character will not
appear in the macro e(pansion!
4.2.2 Generation of Unique Labels
&a#els in the macro #ody may cause 3duplicate la#els4 pro#lem if the macro is
in0ocated and e(panded multiple times!
,se of relati0e addressing at the source statement le0el is 0ery incon0enient2
error-prone2 and difficult to read!
It is highly desira#le to
1! &et the programmer use la#el in the macro #ody
&a#els used within the macro #ody #egin with ^!
6! &et the macro processor generate uni-ue la#els for each macro in0ocation and
e(pansion!
During macro e(pansion2 the ^ will #e replaced with ^((2 where ((
is a two-character alphanumeric counter of the num#er of macro
instructions e(panded!
BBF$$2 $"2 $C YY!
`Consider the definition of WRBUFF
5 COPY START 0
:
:
135 TD X &OUTDEV`
:
140 1EQ `-3
:
155 1LT `-14
:
255 END FIRST
DI
If a la#el was placed on the TD instruction on line 15=2 this la#el would #e
defined twice2 once for each in0ocation of E)",;;!
This duplicate definition would pre0ent correct assem#ly of the resulting
e(panded program!
The @ump instructions on line 1<7 and 1== are written using the re<lati0e
operands K-5 and K-1<2 #ecause it is not possi#le to place a la#el on line 15= of the
macro definition!
This relati0e addressing may #e accepta#le for short @umps such as 3 :'H K-54
;or longer @umps spanning se0eral instructions2 such notation is 0ery
incon0enient2 error-prone and difficult to read!
Many macroprocessors a0oid these pro#lems #y allowing the creation of special
types of la#els within macro instructions!
RDBUFF definition
&a#els within the macro #ody #egin with the special character ^!
Macro expansion
C7
,ni-ue la#els are generated within macro e(pansion!
'ach sym#ol #eginning with ^ has #een modified #y replacing ^ with ^$$!
The character ^ will #e replaced #y ^((2 where (( is a two-character
alphanumeric counter of the num#er of macro instructions e(panded!
;or the first macro e(pansion in a program2 (( will ha0e the 0alue $$! ;or
succeeding macro e(pansions2 (( will #e set to $"2 $C etc!
4.2.3 Conditional Macro Expansion
$rguments in macro in0ocation can #e used to>
o Su#stitute the parameters in the macro #ody without changing the
se-uence of statements e(panded!
o Modify the se-uence of statements for conditional macro expansion (or
conditional assem#ly when related to assem#ler)!
This capa#ility adds greatly to the power and fle(i#ility of a macro
language!
Consider the example
C1
Two additional parameters used in the e(ample of conditional macro e(pansion
o N')> specifies a he(adecimal character code that mar*s the end of a
record
o NM$B&T9> specifies the ma(imum length of a record
Macro-time 0aria#le (S'T sym#ol)
o can #e used to
store wor*ing 0alues during the macro e(pansion
store the e0aluation result of "oolean e(pression
control the macro-time conditional structures
o #egins with 3N4 and that is not a macro instruction parameter
o #e initialiAed to a 0alue of 7
o #e set #y a macro processor directi0e2 S'T
Macro-time conditional structure
o I;-'&S'-'/DI;
o E9I&'-'/DE
C6
Macro
Time
0aria#le
"oolean '(pression
4.2.3.1 Implementation of Conditional Macro Expansion (IF-ELSE-ENDIF
Structure)
$ sym#ol ta#le is maintained #y the macroprocessor!
o This ta#le contains the 0alues of all macro-time 0aria#les used!
o 'ntries in this ta#le are made or modified when S'T statements are
processed!
o This ta#le is used to loo* up the current 0alue of a macro-time 0aria#le
whene0er it is re-uired!
The testing of the condition and looping are done while the macro is #eing
e(panded!
Ehen an I; statement is encountered during the e(pansion of a macro2 the
specified "oolean e(pression is e0aluated! If 0alue is
o T),'
The macro processor continues to process lines from D';T$"
until it encounters the ne(t '&S' or '/DI; statement!
If '&S' is encountered2 then s*ips to '/DI;
o ;$&S'
The macro processor s*ips ahead in D';T$" until it finds the
ne(t '&S' or '/D&; statement!
4.2.3.2 Implementation of Conditional Macro Expansion (WHILE-ENDW
Structure)
Ehen an E9I&' statement is encountered during the e(pansion of a macro2 the
specified "oolean e(pression is e0aluated! If 0alue is
o T),'
The macro processor continues to process lines from D';T$"
until it encounters the ne(t '/DE statement!
Ehen '/DE is encountered2 the macro processor returns to the
preceding E9I&'2 re-e0aluates the "oolean e(pression2 and ta*es
action again!
o ;$&S'
The macro processor s*ips ahead in D';T$" until it finds the
ne(t '/DE statement and then resumes normal macro e(pansion!
4.2.4 Keyword Macro Parameters
C5
Positional parameters
o %arameters and arguments are associated according to their positions in
the macro prototype and in0ocation! The programmer must specify the
arguments in proper order!
o If an argument is to #e omitted2 a null argument should #e used to
maintain the proper order in macro in0ocation statement!
o ;or e(ample> Suppose a macro instruction +'/') has 17 possi#le
parameters2 #ut in a particular in0ocation of the macro only the 5
rd
and I
th
parameters are to #e specified!
o The statement is +'/') 22DI)'CT2222225!
o It is not suita#le if a macro has a large num#er of parameters2 and only a
few of these are gi0en 0alues in a typical in0ocation!
Keyword parameters
o 'ach argument 0alue is written with a *eyword that names the
corresponding parameter!
o $rguments may appear in any order!
o /ull arguments no longer need to #e used!
o If the 5
rd
parameter is named NTU%' and I
th
parameter is named
NC9$//'&2 the macro in0ocation would #e
+'/') TU%'FDI)'CT2C9$//'&F5!
o It is easier to read and much less error-prone than the positional method!
Consider the example
9ere each parameter name is followed #y e-ual sign2 which identifies a *eyword
parameter and a default 0alue is specified for some of the parameters!
C<
9ere the 0alue if NI/D'P is specified as ;5 and the 0alue of N') is specified as null!
C=
4.3. MACROPROCESSOR DESIGN OPTIONS
4.3.1 Recursive Macro Expansion
)DC9$)>
o read one character from a specified de0ice into register $
o should #e defined #eforehand (i!e!2 #efore )D",;;)
C8
Implementation of Recursive Macro Expansion
%re0ious macro processor design cannot handle such *ind of recursi0e macro
in0ocation and e(pansion2 e!g!2 )D",;; ",;;')2 &'/+T92 ;1
)easons>
1) The procedure 'B%$/D would #e called recursi0ely2 thus the in0ocation
arguments in the $)+T$" will #e o0erwritten!
6) The "oolean 0aria#le 'B%$/DI/+ would #e set to ;$&S' when the
3inner4 macro e(pansion is finished2 that is2 the macro process would
forget that it had #een in the middle of e(panding an 3outer4 macro!
5) $ similar pro#lem would occur with %)C'SS&I/' since this procedure
too would #e called recursi0ely!
Solutions>
1) Erite the macro processor in a programming language that allows
recursi0e calls2 thus local 0aria#les will #e retained!
6) ,se a stac* to ta*e care of pushing and popping local 0aria#les and return
addresses!
$nother pro#lem> can a macro in0o*e itself recursi0ely]
4.3.2 One-Pass Macro Processor
$ one-pass macro processor that alternate #etween macro definition and macro
e(pansion in a recursi0e way is a#le to handle recursi0e macro definition!
"ecause of the one-pass structure2 the definition of a macro must appear in the
source program #efore any statements that in0o*e that macro!
Handling Recursive Macro Definition
In D';I/' procedure
o Ehen a macro definition is #eing entered into D';T$"2 the normal
approach is to continue until an M'/D directi0e is reached!
o This would not wor* for recursi0e macro definition #ecause the first
M'/D encountered in the inner macro will terminate the whole macro
definition process!
o To sol0e this pro#lem2 a counter &'P'& is used to *eep trac* of the le0el
of macro definitions!
Increase &'P'& #y 1 each time a M$C) directi0e is read!
Decrease &'P'& #y 1 each time a M'/D directi0e is read!
$ M'/D can terminate the whole macro definition process only
when &'P'& reaches 7!
CD
This process is 0ery much li*e matching left and right parentheses
when scanning an arithmetic e(pression!
4.3.3 Two-Pass Macro Processor
Two-pass macro processor
o %ass 1>
%rocess macro definition
o %ass 6>
'(pand all macro in0ocation statements
%ro#lem
o This *ind of macro processor cannot allow recursi0e macro definition2 that
is2 the #ody of a macro contains definitions of other macros (#ecause all
macros would ha0e to #e defined during the first pass #efore any macro
in0ocations were e(panded)!
Example of Recursive Macro Definition
M$C)S (for SIC)
o Contains the definitions of )D",;; and E)",;; written in SIC
instructions!
M$C)B (for SIC/B')
o Contains the definitions of )D",;; and E)",;; written in SIC/B'
instructions!
$ program that is to #e run on SIC system could in0o*e M$C)S whereas a
program to #e run on SIC/B' can in0o*e M$C)B!
Defining M$C)S or M$C)B does not define )D",;; and E)",;;!
These definitions are processed only when an in0ocation of M$C)S or
M$C)B is e(panded!
CC
4.3.4 General-Purpose Macro Processors
Goal
Macro processors that do not dependent on any particular programming language2
#ut can #e used with a 0ariety of different languages!
Advantages
%rogrammers do not need to learn many macro languages!
CI
$lthough its de0elopment costs are somewhat greater than those for a language-
specific macro processor2 this e(pense does not need to #e repeated for each
language2 thus sa0e su#stantial o0erall cost!
Disadvantages
&arge num#er of details must #e dealt with in a real programming language
Situations in which normal macro parameter su#stitution should not occur2 e!g!2
comments!
;acilities for grouping together terms2 e(pressions2 or statements
To*ens2 e!g!2 identifiers2 constants2 operators2 *eywords
Synta(
4.3.5 Macro Processing within Language Translators
Macro processors can #e
1) Preprocessors
o %rocess macro definitions!
o '(pand macro in0ocations!
o %roduce an e(panded 0ersion of the source program2 which is then used as
input to an assem#ler or compiler!
2) Line-by-line macro processor
o ,sed as a sort of input routine for the assem#ler or compiler!
o )ead source program!
o %rocess macro definitions and e(pand macro in0ocations!
o %ass output lines to the assem#ler or compiler!
3) Integrated macro processor
4.3.5.1 Line-by-Line Macro Processor
Benefits
It a0oids ma*ing an e(tra pass o0er the source program!
Data structures re-uired #y the macro processor and the language translator can
#e com#ined (e!g!2 %T$" and /$MT$")
,tility su#routines can #e used #y #oth macro processor and the language
translator!
o Scanning input lines
o Searching ta#les
o Data format con0ersion
It is easier to gi0e diagnostic messages related to the source statements!
4.3.5.2 Integrated Macro Processor
I7
$n integrated macro processor can potentially ma*e use of any information a#out
the source program that is e(tracted #y the language translator!
$s an e(ample in ;)T)$/
D 177 I F 1267
a D statement>
\ D> *eyword
\ 177> statement num#er
\ I> 0aria#le name
D 177 I F 1
$n assignment statement
\ D177I> 0aria#le (#lan*s are not significant in
;)T)$/)
$n integrated macro processor can support macro instructions that depend upon
the conte(t in which they occur!
Drawbacks of Line-by-line or Integrated Macro Processor
They must #e specially designed and written to wor* with a particular
implementation of an assem#ler or compiler!
The cost of macro processor de0elopment is added to the costs of the language
translator2 which results in a more e(pensi0e software!
The assem#ler or compiler will #e considera#ly larger and more comple(!
UNIT V
TEXT- EDITORS
OVERVIEW OF THE EDITING PROCESS.
$n interacti0e editor is a computer program that allows a user to create and re0ise
a target document! The term document includes o#@ects such as computer programs2
I1
te(ts2 e-uations2 ta#les2 diagrams2 line art and photographs-anything that one might find
on a printed page! Te(t editor is one in which the primary elements #eing edited are
character strings of the target te(t! The document editing process is an interacti0e user-
computer dialogue designed to accomplish four tas*s>
1) Select the part of the target document to #e 0iewed and manipulated
6) Determine how to format this 0iew on-line and how to display it!
5) Specify and e(ecute operations that modify the target document!
<) ,pdate the 0iew appropriately!
Traveling Selection of the part of the document to #e 0iewed and edited! It in0ol0es
first traveling through the document to locate the area of interest such as 3ne(t
screenful42 4#ottom42and 3find pattern4! Tra0eling specifies where the area of interest isR
Filtering - The selection of what is to #e 0iewed and manipulated is controlled #y
filtering! ;iltering e(tracts the rele0ant su#set of the target document at the point of
interest such as ne(t screenful of te(t or ne(t statement!
Formatting: ;ormatting determines how the result of filtering will #e seen as a 0isi#le
representation (the 0iew) on a display screen or other de0ice!
Editing: In the actual editing phase2 the target document is created or altered with a set of
operations such as insert2 delete2 replace2 mo0e or copy!
Manuscript oriented editors operate on elements such as single characters2 words2 lines2
sentences and paragraphsR Program-oriented editors operates on elements such as
identifiers2 *eywords and statements

THE USER-INTERFACE OF AN EDITOR.
The user of an interacti0e editor is presented with a conceptual model of the
editing system! The model is an a#stract framewor* on which the editor and the world on
which the operations are #ased! The line editors simulated the world of the *eypunch
they allowed operations on num#ered se-uence of C7-character card image lines!
The Screen-editors define a world in which a document is represented as a
-uarter-plane of te(t lines2 un#ounded #oth down and to the right! The user sees2 through
a cutout2 only a rectangular su#set of this plane on a multi line display terminal! The
cutout can #e mo0ed left or right2 and up or down2 to display other portions of the
document! The user interface is also concerned with the input de0ices2 the output de0ices2
and the interaction language of the system!
INPUT DEVICES> The input de0ices are used to enter elements of te(t #eing edited2 to
enter commands2 and to designate edita#le elements! Input de0ices are categoriAed as> 1)
Te(t de0ices 6) "utton de0ices 5) &ocator de0ices
I6
1) Text or string devices are typically typewriter li*e *ey#oards on which user presses
and release *eys2 sending uni-ue code for each *ey! Pirtually all computer *ey #oards are
of the HE')TU type!
2) Button or Choice devices generate an interrupt or set a system flag2 usually causing
an in0ocation of an associated application program! $lso special function *eys are also
a0aila#le on the *ey #oard! $lternati0ely2 #uttons can #e simulated in software #y
displaying te(t strings or sym#ols on the screen! The user chooses a string or sym#ol
instead of pressing a #utton!

3) Locator devices: They are two-dimensional analog-to-digital con0erters that position
a cursor sym#ol on the screen #y o#ser0ing the users mo0ement of the de0ice! The most
common such de0ices are the mouse and the tablet.
The Data Tablet is a flat2 rectangular2 electromagnetically sensiti0e panel! 'ither the
#allpoint pen li*e stylus or a puc*2 a small de0ice similar to a mouse is mo0ed o0er the
surface! The ta#let returns to a system program the co-ordinates of the position on the
data ta#let at which the stylus or puc* is currently located! The program can then map
these data-ta#let coordinates to screen coordinates and mo0e the cursor to the
corresponding screen position! Te(t de0ices with arrow (Cursor) *eys can #e used to
simulate locator de0ices! 'ach of these *eys shows an arrow that point up2 down2 left or
right! %ressing an arrow *ey typically generates an appropriate character se-uenceR the
program interprets this se-uence and mo0es the cursor in the direction of the arrow on the
*ey pressed!
VOICE-INPUT DEVICES> which translate spo*en words to their te(tual e-ui0alents2
may pro0e to #e the te(t input de0ices of the future! Poice recogniAers are currently
a0aila#le for command input on some systems!
OUTPUT DEVICES The output de0ices let the user 0iew the elements #eing edited and
the result of the editing operations!
The first output de0ices were teletypewriters and other character-printing terminals
that generated output on paper!
/e(t 3 glass teletypes4 #ased on Cathode )ay Tu#e (C)T) technology which uses
C)T screen essentially to simulate the hard-copy teletypewriter!
Today s advanced CRT terminals use hardware assistance for such features as
mo0ing the cursor2 inserting and deleting characters and lines2 and scrolling lines and
pages!
The modern professional workstations are #ased on personal computers with high
resolution displaysR support multiple proportionally spaced character fonts to produce
realistic facsimiles of hard copy documents!
INTERACTION LANGUAGE>
I5
The interaction language of the te(t editor is generally one of se0eral common
types!
The typing oriented or text command-oriented method It is the oldest of the ma@or
editing interfaces! The user communicates with the editor #y typing te(t strings #oth for
command names and for operands! These strings are sent to the editor and are usually
echoed to the output de0ice! Typed specification often re-uires the user to remem#er the
e(act form of all commands2 or at least their a##re0iations! If the command language is
comple(2 the user must continually refer to a manual or an on-line 9elp function! The
typing re-uired can #e time consuming for in-e(perienced users!
Function key interfaces: 'ach command is associated with mar*ed *ey on the *ey
#oard! This eliminates much typing! '!g!> Insert *ey2 Shift *ey2 Control *ey

Disadvantages:
9a0e too many uni-ue *eys
Multiple *ey stro*e commands
Menu oriented interface $ menu is a multiple choice set of te(t strings or icons which
are graphical sym#ols that represent o#@ects or operations! The user can perform actions
#y selecting items for the menus! The editor prompts the user with a menu! ne pro#lem
with menu oriented system can arise when there are many possi#le actions and se0eral
choices are re-uired to complete an action! The display area of the menu is rather limited
I<
Most Te(t editors ha0e a structure similar to that shown a#o0e!
The command &anguage %rocessor It accepts input from the users input de0ices2 and analyAes
the to*ens and syntactic structure of the commands! It functions much li*e the le(ical and
syntactic phases of a compiler! The command language processor may in0o*e the semantic
routines directly! In a te(t editor2 these semantic routines perform functions such as editing and
0iewing! The semantic routines in0ol0e tra0eling2 editing2 0iewing and display functions! 'diting
operations are always specified #y the user and display operations are specified implicitly #y the
other three categories of operations! Tra0eling and 0iewing operations may #e in0o*ed either
e(plicitly #y the user or implicitly #y the editing operations
'diting Component
In editing a document2 the start of the area to #e edited is determined #y the current
editing pointer maintained #y the editing component2 which is the collection of modules dealing
with editing tas*s! The current editing pointer can #e set or reset e(plicitly #y the user using
tra0elling commands2 such as ne(t paragraph and ne(t screen2 or implicitly as a side effect of the
pre0ious editing operation such as delete paragraph!
Tra0eling Component
The tra0eling component of the editor actually performs the setting of the current editing
and 0iewing pointers2 and thus determines the point at which the 0iewing and /or editing filtering
#egins!
Piewing Component
The start of the area to #e 0iewed is determined #y the current 0iewing pointer! This
pointer is maintained #y the 0iewing component of the editor2 which is a collection of modules
responsi#le for determining the ne(t 0iew! The current 0iewing pointer can #e set or reset
e(plicitly #y the user or implicitly #y system as a result of pre0ious editing operation! The
0iewing component formulates an ideal 0iew2 often e(pressed in a de0ice independent
intermediate representation! This 0iew may #e a 0ery simple one consisting of a windows worth
of te(t arranged so that lines are not #ro*en in the middle of the words!
Display Component
It ta*es the idealiAed 0iew from the 0iewing component and maps it to a physical output
de0ice in the most efficient manner! The display component produces a display #y mapping the
#uffer to a rectangular su#set of the screen2 usually a window
'diting ;ilter
;iltering consists of the selection of contiguous characters #eginning at the current point!
The editing filter filters the document to generate a new editing #uffer #ased on the current
editing pointer as well as on the editing filter parameters
'diting "uffer
It contains the su#set of the document filtered #y the editing filter #ased on the editing
pointer and editing filter parameters
Piewing ;ilter
I=
Ehen the display needs to #e updated2 the 0iewing component in0o*es the 0iewing filter!
This component filters the document to generate a new 0iewing #uffer #ased on the current
0iewing pointer as well as on the 0iewing filter parameters!
Piewing "uffer
It contains the su#set of the document filtered #y the 0iewing filter #ased on the 0iewing
pointer and 0iewing filter parameters! '!g! The user of a certain editor might tra0el to line D=2and
after 0iewing it2 decide to change all occurrences of 3ugly duc*ling4 to 3swan4 in lines 1 through
=7 of the file #y using a change command such as
Z12=7[ c/ugly duc*ling/swan/
$s a part of the editing command there is implicit tra0el to the first line of the file! &ines
1 through =7 are then filtered from the document to #ecome the editing #uffer! Successi0e
su#stitutions ta*e place in this editing #uffer without corresponding updates of the 0iew
In Line editors2 the 0iewing #uffer may contain the current lineR in screen editors2 this
#uffer may contain rectangular cut out of the -uarter-plane of te(t! This 0iewing #uffer is then
passed to the display component of the editor2 which produces a display #y mapping the #uffer to
a rectangular su#set of the screen2 usually called a window$
The editing and 0iewing #uffers2 while independent2 can #e related in many ways! In a
simplest case2 they are identical> the user edits the material directly on the screen! n the other
hand2 the editing and 0iewing #uffers may #e completely dis@oint!
Windows typically co0er the entire screen or rectangular portion of it! Mapping 0iewing
#uffers to windows that co0er only part of the screen is especially useful for editors on
modern graphics #ased wor*stations! Such systems can support multiple windows2
simultaneously showing different portions of the same file or portions of different file!
I8
This approach allows the user to perform inter-file editing operations much more
effecti0ely than with a system only a single window!
The mapping of the 0iewing #uffer to a window is accomplished #y two components of
the system!
(i) ;irst2 the 0iewing component formulates an ideal 0iew often e(pressed in a
de0ice independent intermediate representation! This 0iew may #e a 0ery
simple one consisting of a windows worth of te(t arranged so that lines are not
#ro*en in the middle of words! $t the other e(treme2 the idealiAed 0iew may #e a
facsimile of a page of fully formatted and typeset te(t with e-uations2 ta#les and
figures!
(ii) Second the display component ta*es these idealiAed 0iews from the 0iewing
component and maps it to a physical output de0ice the most efficient manner
possi#le!
The components of the editor deal with a user document on two le0els>
(i) In main memory and
(ii) In the disk file system!
&oading an entire document into main memory may #e infeasi#le! 9owe0er if
only part of a document is loaded and if many user specified operations re-uire a dis*
read #y the editor to locate the affected portions2 editing might #e unaccepta#ly slow! In
some systems this pro#lem is sol0ed #y the mapping the entire file into virtual memory
and letting the operating system perform efficient demand paging!
$n alternati0e is to pro0ide is the editor paging routines which read one or more
logical portions of a document into memory as needed! Such portions are often termed
pages2 although there is usually no relationship #etween these pages and the hard copy
document pages or 0irtual memory pages! These pages remain resident in main memory
until a user operation re-uires that another portion of the document #e loaded!
'ditors function in three #asic types of computing en0ironment>
(i) Time-sharing environment
(ii) Stand-alone environment and
(iii) Distributed environment!
'ach type of en0ironment imposes some constraint on the design of an editor! The Time
Sharing 'n0ironment The time sharing editor must function swiftly within the conte(t
of the load on the computers processor2 central memory and I/ de0ices!
The Stand alone 'n0ironment The editor on a stand-alone system must ha0e access to the
functions that the time sharing editors o#tain from its host operating system! This may #e
pro0ided in pare #y a small local operating system or they may #e #uilt into the editor itself if the
stand alone system is dedicated to editing! Distri#uted 'n0ironment The editor operating in a
distri#uted resource sharing local networ* must2 li*e a standalone editor2 run independently on
each users machine and must2 li*e a time sharing editor2 content for shared resources such as
files!
ID
INTERACTIVE DEBUGGING SYSTEMS
$n interacti0e de#ugging system pro0ides programmers with facilities that aid in testing
and de#ugging of programs interacti0ely!
DEBUGGING FUNCTIONS AND CAPABILITIES
'(ecution se-uencing> It is the o#ser0ation and control of the flow of program e(ecution!
;or e(ample2 the program may #e halted after a fi(ed num#er of instructions are e(ecuted!
Breakpoints The programmer may define #rea* points which cause e(ecution to #e
suspended2 when a specified point in the program is reached! $fter e(ecution is suspended2 the
de#ugging command is used to analyAe the progress of the program and to diagnose errors
detected! '(ecution of the program can then #e remo0ed!
Conditional Expressions %rogrammers can define some conditional e(pressions2
e0aluated during the de#ugging session2 program e(ecution is suspended2 when conditions are
met2 analysis is made2 later e(ecution is resumed
Gaits- +i0en a good graphical representation of program progress may e0en #e useful in
running the program in 0arious speeds called gaits! $ De#ugging system should also pro0ide
functions such as tracing and trace#ac*! Tracing can #e used to trac* the flow of e(ecution logic
and data modifications! The control flow can #e traced at different le0els of detail procedure2
#ranch2 indi0idual instruction2 and so onY
Traceback can show the path #y which the current statement in the program was reached! It can
also show which statements ha0e modified a gi0en 0aria#le or parameter! The statements are
displayed rather than as he(adecimal displacements! %rogram-display Capa#ilities It is also
important for a de#ugging system to ha0e good program display capa#ilities! It must #e possi#le
to display the program #eing de#ugged2 complete with statement num#ers! Multilingual
Capa#ility $ de#ugging system should consider the language in which the program #eing
de#ugged is written! Most user en0ironments and many applications systems in0ol0e the use of
different programming languages! $ single de#ugging tool should #e a0aila#le to multilingual
situations!
Context Effects
The conte(t #eing used has many different effects on the de#ugging interaction! ;or
e(ample! The statements are different depending on the language
C"& - MP' 8!= T B
;)T)$/ - B F 8!=
&i*ewise conditional statements should use the notation of the source language
C"& - I; $ /T 'H,$& T "
;)T)$/ - I; ($ !/'! ")
Similar differences e(ist with respect to the form of statement la#els2 *eywords and so
on!
Display of source code
The language translator may pro0ide the source code or source listing tagged in some
standard way so that the de#ugger has a uniform method of na0igating a#out it!
IC
Optimization:
It is also important that a de#ugging system #e a#le to deal with optimiAed code! Many
optimiAations in0ol0e the rearrangement of segments of code in the program
;or eg! - invariant expressions can be removed from loop - separate
loops can be combined into a single loop - redundant expression may be
eliminated - elimination of unnecessary branch instructions The de#ugging of
optimiAed code re-uires a su#stantial amount of cooperation from the optimiAing compiler!
Relationship with Other Parts of the System
$n interacti0e de#ugger must #e related to other parts of the system in many different
ways! $0aila#ility Interacti0e de#ugger must appear to #e a part of the run-time en0ironment and
an integral part of the system! Ehen an error is disco0ered2 immediate de#ugging must #e
possi#le #ecause it may #e difficult or impossi#le to reproduce the program failure in some other
en0ironment or at some other times! Consistency with security and integrity components ,ser
need to #e a#le to de#ug in a production en0ironment! Ehen an application fails during a
production run2 wor* dependent on that application stops! Since the production en0ironment is
often -uite different from the test en0ironment2 many program failures cannot #e repeated outside
the production en0ironment! De#ugger must also e(ist in a way that is consistent with the security
and integrity components of the system! ,se of de#ugger must #e su#@ected to the normal
authoriAation mechanism and must lea0e the usual audit trails! Someone (unauthoriAed user) must
not access any data or code! It must not #e possi#le to use the de#uggers to interface with any
aspect of system integrity! Coordination with e(isting and future systems The de#ugger must co-
ordinate its acti0ities with those of e(isting and future language compilers and interpreters! It is
assumed that de#ugging facilities in e(isting language will continue to e(ist and #e maintained!
The re-uirement of cross-language de#ugger assumes that such a facility would #e installed as an
alternati0e to the indi0idual language de#uggers!
USER- INTERFACE CRITERIA
The interacti0e de#ugging system should #e user friendly! The facilities of
de#ugging system should #e organiAed into few #asic categories of functions which
should closely reflect common user tas*s!
Full screen displays and windowing systems
The user interaction should ma*e use of full-screen display and windowing systems!
The ad0antage of such interface is that the information can #e should displayed and
changed easily and -uic*ly!
Menus:
Eith menus and full screen editors2 the user has far less information to enter and
remem#er
It should #e possi#le to go directly to the menus without ha0ing to retrace an entire
hierarchy!
Ehen a full-screen terminal de0ice is not a0aila#le2 user should ha0e an e-ui0alent
action in a linear de#ugging language #y pro0iding commands!
Command language:
The command language should ha0e a clear2 logical2 simple synta(! %arameters names
should #e consistent across set of commands
II
%arameters should automatically #e chec*ed for errors for type and range 0alues!
Defaults should #e pro0ided for parameters!
Command language should minimiAe punctuations such as parenthesis2 slashes2 and
special characters!
On Line HELP facility
+ood interacti0e system should ha0e an on-line 9'&% facility that should pro0ide
help for all options of menu
9elp should #e a0aila#le from any state of the de#ugging system!
177