You are on page 1of 128

CS 6103 SYSTEM PROGRAMMING

Dr. Shamama Anwar


Assistant Professor
Department of Computer Science and Engineering,
BIT, Mesra
Module – II & III
1. Basic Assembler Functions
2. Machine – Dependent Assembler Features
3. Machine – Independent Assembler Features
4. Assembler Design Options
5. Implementation Examples.

2
Introduction to Assemblers
• Fundamental functions
• Translating mnemonic operation codes to their machine language equivalents
• Assigning machine addresses to symbolic labels

• Machine dependency
• Different machine instruction formats and codes

Role of Assembler
Source Program Assembler Object Code Linker

Executable Code

Loader

3
Example Program
Purpose
• reads records from input device (code F1)

• copies them to output device (code 05)

• at the end of the file, writes EOF on the output device, then RSUB to the
operating system

4
Example
Read Record into Buffer Read Record into Buffer and Subroutine
calculate length
LDX ZERO LDX ZERO RDREC:
RL: TD IP RL: TD IP LDX ZERO
JEQ RL JEQ RL RL: TD IP
RD IP RD IP JEQ RL
COMP ZERO COMP ZERO RD IP
JEQ EXIT JEQ EXIT COMP ZERO
STCH BUFFER, X STCH BUFFER, X JEQ EXIT
TIX MAX TIX MAX STCH BUFFER, X
JLT RL JLT RL TIX MAX
EXIT: EXIT:STX LENGTH JLT RL
EXIT:STX LENGTH
ZERO WORD 0 ZERO WORD 0 RSUB
IP BYTE X’7’ IP BYTE X’7’ ZERO WORD 0
BUFFER RESB 4096 BUFFER RESB 4096 IP BYTE X’7’
MAX WORD 4096 MAX WORD 4096 BUFFER RESB 4096
LENGTH RESW 1 MAX WORD 4096
LENGTH RESW 1

5
Example
Write Record from Buffer Subroutine

LDX ZERO WRREC:


WL: TD OP LDX ZERO
JEQ WL WL: TD OP
LDCH BUFFER, X JEQ WL
WD OP LDCH BUFFER, X
TIX LENGTH WD OP
JLT WL TIX LENGTH
JLT WL
ZERO WORD 0 RSUB
OP BYTE X’3’
BUFFER RESB 4096 ZERO WORD 0
LENGTH RESW 1 OP BYTE X’3’
BUFFER RESB 4096
LENGTH RESW 1

6
Example
COPY: RDREC: WRREC:
STL RADR LDX ZERO LDX ZERO
CL: JSUB RDREC RL: TD IP WL: TD OP
LDA LENGTH JEQ RL JEQ WL
COMP ZERO RD IP LDCH BUFFER, X
JEQ EF COMP ZERO WD OP
JSUB WRREC JEQ EXIT TIX LENGTH
J CL STCH BUFFER, X JLT WL
EF: LDA EOF TIX MAX RSUB
STA BUFFER JLT RL
LDA THREE EXIT:STX LENGTH OP BYTE X’3’
STA LENGTH RSUB
JSUB WRREC
LDL RADR IP BYTE X’7’
RSUB MAX WORD 4096
EOF BYTE C’EOF’
THREE WORD 3
ZERO WORD 0
RADR RESW 1
LENGTH RESW 1
BUFFER RESB 4096 7
Basic Assembler Functions
Assembler Directives

• Pseudo-Instructions
• Not translated into machine instructions
• Providing information to the assembler

• Basic assembler directives


• START
• END
• BYTE
• WORD
• RESB
• RESW

8
SIC Assembly Program
Line numbers Mnemonic opcode
(for reference)
comments
Address labels
operands

9
SIC Assembly Program
Indicates comment lines

10
Assembler’s Function
• Convert mnemonic operation codes to their machine language
equivalents.

• Convert symbolic operands to their equivalent machine


addresses.

• Build the machine instructions in the proper format.

• Convert the data constants to internal machine


representations.

• Write the object program and the assembly listing.

11
SIC Example Program
Assembler
Directive

Machine
address (hexa)

12
SIC Example Program

13
Object Program
• Header
Col.1 H
Col.2~7 Program name
Col.8~13 Starting address of object program (hex)
Col.14-19 Length of object program in bytes (hex)
• Text
Col.1 T
Col.2~7 Starting address in this record (hex)
Col.8~9 Length of object code in this record in bytes (hex)
Col.10~69 Object code (hex)
• End
Col.1 E
Col.2~7 Address of first executable instruction (hex)

14
Object Program

15
Difficulties: Forward Reference
• Forward reference: reference to a label that is defined later in the program.

Loc Label Operator Operand

1000 FIRST STL RETADR


1003 CLOOP JSUB RDREC
… … … …
1012 J CLOOP
… … … …
1033 RETADR RESW 1

16
Two Pass Assembler
• Pass 1
• Assign addresses to all statements in the program
• Save the values assigned to all labels for use in Pass 2
• Perform some processing of assembler directives
• Pass 2
• Assemble instructions
• Generate data values defined by BYTE, WORD
• Perform processing of assembler directives not done in Pass 1
• Write the object program and the assembly listing

17
Data Structures
• Operation Code Table (OPTAB)
• Symbol Table (SYMTAB)
• Location Counter(LOCCTR)

18
Location Counter (LOCCTR)
• A variable that is used to help in the assignment of addresses, i.e.,
LOCCTR gives the address of the associated label.

• LOCCTR is initialized to be the beginning address specified in the START


statement.

• After each source statement is processed during pass 1, the length of


assembled instruction or data area to be generated is added to LOCCTR.

19
Operation Code Table (OPTAB)
• Contents:
• Mnemonic operation codes
• Machine language equivalents
• Instruction format and length
• During pass 1:
• Validate operation codes in source program
• Find the instruction length to increase LOCCTR
• During pass 2:
• Determine the instruction format
• Translate the operation codes to their machine language equivalents
• Implementation: a static hash table with mnemonic operation
code as key. (entries are not normally added to or deleted from
it)
• Hash table organization is particularly appropriate.

20
SYMTAB COPY
FIRST
1000
1000
CLOOP 1003
• Contents: ENDFIL 1015
• Label name EOF 1024
• Label address THREE 102D
ZERO 1030
• Flags (to indicate error conditions) RETADR 1033
• Data type or length LENGTH 1036
• During pass 1: BUFFER 1039
RDREC 2039
• Store label name and assigned address (from LOCCTR) in SYMTAB
• During pass 2:
• Symbols used as operands are looked up in SYMTAB
• Implementation:
• a dynamic hash table for efficient insertion and retrieval
• Should perform well with non-random keys (LOOP1, LOOP2).

21
Two Pass Assembler
Source
program

Intermediate Object
Pass 1 Pass 2
file codes

OPTAB SYMTAB

Algorithm for Pass 1 and Pass2

22
Assembler Pass 1

23
Assembler Pass 2

24
Assembler Design
• Machine Dependent Assembler Features
• instruction formats and addressing modes (SIC/XE)
• program relocation
• Machine Independent Assembler Features
• literals
• symbol-defining statements
• expressions
• program blocks
• control sections and program linking

25
2. Machine Dependent Assembler Features
The Differences Between the SIC and SIC/XE Programs

• Register-to-register instructions are used whenever possible to


improve execution speed.
• Fetching a value stored in a register is much faster than fetching it
from the memory.
• Immediate addressing mode is used whenever possible.
• Operand is already included in the fetched instruction. There is no
need to fetch the operand from the memory.
• Indirect addressing mode is used whenever possible.
• Just one instruction rather than two is enough.

26
2. Machine Dependent Assembler Features
Instruction Format and Addressing Mode

• SIC/XE
• PC-relative or Base-relative addressing: op m
• Indirect addressing: op @m
• Immediate addressing: op #c
• Extended format: +op m
• Index addressing: op m,x
• register-to-register instructions
• larger memory -> multi-programming (program allocation)

27
A SIC/XE Program

28
A SIC/XE Program

29
A SIC/XE Program

30
A SIC/XE Program

31
Generate Relocatable Programs
Let the
assembled
program start at
address 0 so that
later it can be
easily moved to
any place in the
physical memory.

32
33
34
Relative Addressing Modes

• PC-relative or base-relative addressing mode is


preferred over direct addressing mode.
• Can save one byte from using format 3 rather than format 4.
• Reduce program storage space
• Reduce program instruction fetch time
• Relocation will be easier.

35
PC or Base-Relative Modes
• Format 3: 12-bit displacement field (in total 3 bytes)
• Base-relative: 0~4095
• PC-relative: -2048~2047
• Format 4: 20-bit address field (in total 4 bytes)
• The displacement needs to be calculated so that when the
displacement is added to PC (which points to the following
instruction after the current instruction is fetched) or the base
register (B), the resulting value is the target address.
• If the displacement cannot fit into 12 bits, format 4 then needs
to be used.
• Bit e needs to be set 1 to indicate format 4.
• A programmer must specify the use of format 4 by putting a
+ before the instruction. Otherwise, it will be treated as an
error.
36
Base-Relative vs. PC-Relative
• The difference between PC and base relative addressing modes
is that the assembler knows the value of PC when it tries to use
PC-relative mode to assembles an instruction. However, when
trying to use base-relative mode to assemble an instruction,
the assembler does not know the value of the base register.
• Therefore, the programmer must tell the assembler the
value of register B.
• This is done through the use of the BASE directive.
• Also, the programmer must load the appropriate value into
register B by himself.
• Another BASE directive can appear later, this will tell the
assembler to change its notion of the current value of B.
• NOBASE can also be used to tell the assembler that no more
base-relative addressing mode should be used.

37
PC-Relative Example - 1
10 0000 FIRST STL RETADR 17202D
12 0003
op(6) n i x bpe disp (12)
(14)16 110010 (02D) 16
(0001 0111) (0010 0000) (2D) 16
(17) 16 (20) 16 (2D) 16
displacement= RETADR - PC = 30 - 3 = 2D

After fetching this instruction and before executing it,the PC will be


0003.

38
PC-Relative Example - 2
40 0017 J CLOOP 3F2FEC
45 001A …….
op(6) n i x bpe disp (12)

(3C)16 11 0 0 10 (FEC) 16
(0011 1111) (0010 1111) (EC) 16
(3F) 16 (2F) 16 (EC) 16

Displacement = CLOOP - PC= 6 - 1A= -14 = FEC

39
Base-Relative Example

0003 LDB #LENGTH


BASE LENGTH
0033 LENGTH RESW 4096

160 104E STCH BUFFER, X 57C003

op(6) n i x bpe disp (12)


( 54 )16 1 1 11 00 ( 003 ) 16
(0101 0111) (1100)
(57) (C) (003)

Displacement = BUFFER - B = 0036 - 0033 = 3

40
Immediate Addressing Example - 1
55 0020 LDA #3 010003

op(6) n i x bpe disp (12)


( 00 )16 0 1 0 00 0 ( 003 ) 16

133 103C +LDT #4096 75101000


op(6) n i x bpe disp (20)

( 74 )16 01 0 0 01 ( 01000 ) 16

41
Indirect Addressing Example
• The target address is computed as usual (either PC-relative
or BASE-relative)
• We only need to set the n bit to 1 to indicate that the content
stored at this location represents the address of the operand,
not the operand itself.

70 002A J @RETADR 3E2003


op(6) n i x bpe disp (12)

( 3C )16 1 0 0 010 ( 003 ) 16


(0011) (1110) (0010) ( 003 ) 16
(3E) (20) (03)
Displacement = 0030 – 002D = 0003

42
The Object Code

43
The Object Code

44
The Object Code

45
Program Relocation
• The SIC program specifies that it must be loaded at address
1000 for correct execution. This restriction is too inflexible for
the loader.

• If the program is loaded at a different address, say 2000, its


memory references will access wrong data.

• Thus, we want to make programs relocatable so that they can


be loaded and execute correctly at any place in the memory.

46
Why Program Relocation
• To increase the productivity of the machine

• Want to load and run several programs at the same time


(multiprogramming)

• Must be able to load programs into memory wherever there is


room

• Actual starting address of the program is not known until load


time

47
Absolute Program
• Program with starting address specified at assembly time

• Example: SIC assembly program

• The address may be invalid if the program is loaded some


where else.

48
Absolute Program

49
What Needs to be Relocated
• Need to be modified:
• The address portion of those instructions that use absolute (direct)
addresses.

• Need not be modified:


• Register-to-register instructions (no memory references)
• PC or base-relative addressing (relative displacement remains the same
regardless of different starting addresses)
• Immediate addressing

50
How to Relocate Addresses
• For Assembler
• For an address label, its address is assigned relative to the start of the
program (that’s why START 0)
• Produce a modification record to store the starting location and the
length of the address field to be modified.

• For loader
• For each modification record, add the actual beginning address of the
program to the address field at load time.

51
Relocatable Program

• Modification Record

Col.1 M

Col.2~7 Starting location of the address field to be


modified, relative to the beginning of the program

Col.8~9 length of the address field to be modified, in half


bytes.

52
The Relocatable Object Code

53
Machine Independent Assembler Features
•Literals
•Symbol Defining Statement
•Expressions
•Program Blocks
•Control Sections and Program Linking

54
Machine Independent Assembler Features
• Features are not closely related to machine architecture.

• More related to issues about:


• Programmer convenience
• Software environment

• Common examples:
• Literals
• Symbol-defining statements
• Expressions
• Program blocks
• Control sections

• Assembler directives are widely used to support these features

55
Machine Independent Assembler Features
Literals
• Literal is equivalent to:
• Define a constant explicitly and assign an address label for it
• Use the label as the instruction operand

• Why use literals:


• To avoid defining the constant somewhere and making up a label for it
• Instead, to write the value of a constant operand as a part of the
instruction

• How to use literals:


• A literal is identified with the prefix =, followed by a specification of the
literal value

56
Machine Independent Assembler Features
Literals: Example

ENDFIL LDA EOF


……. ENDFIL LDA = C‘ÉOF’
…….
EOF BYTE C‘ÉOF’

RLOOP TD INPUT
…….
……. RLOOP TD = X‘F1’
INPUT BYTE X‘F1’

57
57
Original Program

58
Program using Literals

59
59
Literals vs. Immediate Operands
• Immediate Operands
• The operand value is assembled as part of the machine instruction
55 0020 LDA #3 010003

• Literals
• The assembler generates the specified value as a constant at some other
memory location

45 001A ENDFIL LDA =C’EOF’ 032010

• Literal pools
• Normally literals are placed into a pool at the end of the program
• In some cases, it is desirable to place literals into a pool at some other
location in the object program
• assembler directive LTORG
• reason: keep the literal operand close to the instruction

60
Object Program Using Literal

61
Original Program

62
Using Literal

63
Object Program Using Literal

64
Duplicate Literals
• Duplicate literals:
• The same literal used more than once in the program
• Only one copy of the specified value needs to be stored

215 1062 WLOOP TD =X’05’


…….
230 106B WD =X’05’

• How to recognize the duplicate literals


• Compare the character strings defining them
• Easier to implement, but has potential problem.
• Compare the generated data value
• Better, but will increase the complexity of the assembler
=C’EOF’ and =X’454F46’

65
Problem of Duplicate-Literal Recognition using
Character Strings
• There may be some literals that have the same name, but different values

• For example, the literal whose value depends on its location in the program
• The value of location counter denoted by *
BASE *
LDB =*
• The literal =* repeatedly used in the program has the same name, but
different values

• All this kind of literals have to be stored in the literal pool

66
Implementation of Literal
• Data structure: a literal table LITTAB
• Literal name
• Operand value and length
• Address

• LITTAB is often organized as a hash table, using the literal name


or value as the key

67
Implementation of Literal
• Pass 1
• As each literal operand is recognized
• Search the LITTAB for the specified literal name or value
• If the literal is already present, no action is needed
• Otherwise, the literal is added to LITTAB (store the name, value, and length,
but not address)
• As LTORG or END is encountered
• Scan the LITTAB
• For each literal with empty address field, assign the address and update the
LOCCTR accordingly

• Pass 2
• As each literal operand is recognized
• Search the LITTAB for the specified literal name or value
• Use the associated address as the operand of the instruction
• As LTORG or END is encountered
• insert the data values of the literals in the object program
• Modification record is generated if necessary
68
Symbol-Defining Statements
• How to define symbols and their values

• Address label
• The label is the symbol name and the assigned address is its value
FIRST STL RETADR

• Assembler directive EQU


symbol EQU value
• This statement enters the symbol into SYMTAB and assigns to it the
value specified
• The value can be a constant or an expression

• Assembler directive ORG


ORG value

69
Use of EQU
• Improves program readability and makes it easier to find and change
constant values

+LDT #4096
MAXLEN EQU 4096
+LDT #MAXLEN

• To define mnemonic names for registers

A EQU 0
X EQU 1
BASE EQU R1
INDEX EQU R2

70
Example of ORG
• Indirect value assignment:
ORG value

 When ORG is encountered, the assembler resets its LOCCTR to the


specified value
 ORG will affect the values of all labels defined until the next ORG
 If the previous value of LOCCTR can be automatically remembered, we
can return to the normal use of LOCCTR by simply write
ORG
• Data structure
 SYMBOL: 6 bytes
 VALUE: 3 bytes (one word)
 FLAGS: 2 bytes
• Refer to every field of each entry

71
Use of ORG

Offsets from STAB

• We can fetch the VALUE field by


LDA VALUE,X
X = 0, 11, 22, … for each entry

72
Use of ORG

Offsets from STAB

• We can fetch the VALUE field by


LDA VALUE,X
X = 0, 11, 22, … for each entry

73
Use of ORG
Set the LOCCTR to STAB

Size of field
more meaningful

Restore the LOCCTR to its previous


value
Or only use ORG

74
Forward-Reference Problem
• Forward reference is not allowed for EQU and ORG.

• That is, all terms in the value field must have been defined
previously in the program.

• The reason is that all symbols must have been defined during
Pass 1 in a two-pass assembler.

Allowed

Not allowed

75
Forward-Reference Problem

Not allowed

Not allowed

76
ORG Example
• Using EQU statements

STAB RESB 1100


SYMBOL EQU STAB
VALUE EQU STAB+6
FLAG EQU STAB+9

• Using ORG statements

STAB RESB 1100


ORG STAB
SYMBOL RESB 6
VALUE RESW 1
FLAGS RESB 2
ORG STAB+1100

77
Expressions
• A single term as an instruction operand can be replaced by an
expression.

STAB RESB 1100

STAB RESB 11*100

STAB RESB (6+3+2)*MAXENTRIES

• The assembler has to evaluate the expression to produce a single


operand address or value.

78
Expressions
• Expressions consist of
• Operator
• +,-,*,/ (division is usually defined to produce an integer result)
• Individual terms
• Constants
• User-defined symbols
• Special terms, e.g., *, the current value of LOCCTR

79
Relocation Problem in Expressions
• Values of terms can be
• Absolute (independent of program location)
• constants
• Relative (to the beginning of the program)
• Address labels
• * (value of LOCCTR)
• Expressions can be
• Absolute
• Only absolute terms
• Relative terms in pairs with opposite signs for each pair
• Relative
• All the relative terms except one can be paired as described in
“absolute”. The remaining unpaired relative term must have a
positive sign.
• No relative terms may enter into a multiplication or division operation
• Expressions that do not meet the conditions of either “absolute” or
“relative” should be flagged as errors.

80
Expressions
• Expressions can be classified as absolute expressions or relative
expressions
MAXLEN EQU BUFEND-BUFFER

BUFEND and BUFFER both are relative terms, representing


addresses within the program but the expression BUFEND-BUFFER
represents an absolute value

• When relative terms are paired with opposite signs, the


dependency on the program starting address is canceled out;
the result is an absolute value

81
Absolute Expressions

• Relative term or expression implicitly represents (S+r)


• S: the starting address of the program
• r: value of the term or expression relative to S
• For example
BUFFER: S+r1
BUFEND: S+r2
• The expression, BUFEND-BUFFER, is absolute.
MAXLEN = (S+r2)-(S+r1) = r2-r1 (no S here)
MAXLEN means the length of the buffer area

82
Absolute Expressions
• Illegal expressions:

BUFEND+BUFFER
100-BUFFER
3*BUFFER
because they are not absolute values nor locations within the program

83
Absolute or Relative
• To determine the type of an expression, we must keep track of the types of all
symbols defined in the program.
• We need a “flag” in the SYMTAB for indication.

84
Program Blocks
• Program blocks
• refer to segments of code that are rearranged within a single object program
unit
USE [blockname]

• At the beginning, statements are assumed to be part of the unnamed


(default) block

• If no USE statements are included, the entire program belongs to this single
block

• Each program block may actually contain several separate segments of the
source program

85
Program Block Example
Default block.

86
Program Block Example

Use the default block.

87
Program Blocks - Implementation
Pass 1:
• Maintain a separate location counter for each program block.
• The location counter for a block is initialized to 0 when the block
first begins.
• The current value of this location counter is saved when
switching to another block, and the saved value is restored when
resuming a previous block.
• Thus, during pass 1, each label is assigned an address that is
relative to the beginning of the block that contains it.
• After pass 1, the latest value of the location counter for each
block indicates the length of that block.
• The assembler then can assign to each block a starting address
in the object program.

88
Program Blocks - Implementation
• Pass 2
• When generating object code, the assembler needs the address for each
symbol relative to the start of the object program (not the start of an
individual problem block)
• This can be easily done by adding the location of the symbol (relative to the
start of its block) to the assigned block starting address.

89
Example

There is no block
number for MAXLEN.
This is because
MAXLEN is an
absolute symbol.

90
Symbol Table

0006 LDA LENGTH

Consider the symbol LENGTH with relative address 0003 in program block 1 (CDATA).
Starting address for CDATA is 0066.
TA = 0003+0066=0069.
Displacement = TA – PC = 0069 – 0009 = 60

92
Pass 1 of program blocks
Modify the
assembler Pass
1 algorithm to
handle program
blocks

93
Pass 2 of program blocks
Modify the
assembler Pass
2 algorithm to
handle program
blocks

94
Control Sections and Program Linking
Control Sections

• A control section is a part of the program that maintains its


identity after assembly.
• Each such control section can be loaded and relocated
independently of the others.
• Different control sections are often used for subroutines or other
logical subdivisions of a program.
• The programmer can assemble, load, and manipulate each of
these control sections separately.

CSECT

95
Control Sections and Program Linking
Program Linking
• Instructions in one control section may need to refer to instructions
or data located in another control section.
• Thus, program (actually, control section) linking is necessary.
• Because control sections are independently loaded and relocated,
the assembler is unable to know a symbol’s address at assembly
time. This job can only be delayed and performed by the loader.
• We call the references that are between control sections “external
references”.
• The assembler generates information for each external reference
that will allow the loader to perform the required linking.

96
External Definition and References
• External definition
EXTDEF name [,name]
• EXTDEF names symbols that are defined in this control section and may be
used by other sections

• External reference
EXTREF name [,name]
• EXTREF names symbols that are used in this control section and are defined
elsewhere

97
Control Section Example

Default control section

98
A new control section

99
A new control section

100
Implementation
• The assembler must include information in the object program that will cause the
loader to insert proper values where they are required

Define record
Col. 1 D
Col. 2-7 Name of external symbol defined in this
control section
Col. 8-13 Relative address within this control
section (hexadecimal)
Col.14-73 Repeat information in Col. 2-13 for
other external symbols
Refer record
Col. 1 R
Col. 2-7 Name of external symbol referred to in
this control section
Col. 8-73 Name of other external reference symbols

101
Modification Record
• The control section name is automatically an external symbol, i.e. it is available
for use in Modification records.

Modification record

Col. 1 M
Col. 2-7 Starting address of the field to be
modified (hexadecimal)
Col. 8-9 Length of the field to be modified, in
half-bytes (hexadecimal)
Col. 10 Modification flag (+ or -)
Col.11-16 External symbol whose value is to be
added to or subtracted from the indicated
field

102
Object Program

103
Assembler Design Options

• One Pass Assembler

• Multi Pass Assembler

104
One Pass Assembler
• Main problem: forward references
• data items
• labels on instructions
• Solution
• data items: require all such areas be defined before they are
referenced
• labels on instructions: no good solution

105
Program Example

106
One Pass Assembler
• Two types of one-pass assembler

• load-and-go
• produces object code directly in memory for immediate execution
• No loader is needed
• Can save time for scanning the source code again

• the other
• produces usual kind of object code for later execution

107
Load-and-go Assembler
• Characteristics

• Avoids the overhead of writing the object program out and


reading it back

• However one-pass also avoids the over head of an additional


pass over the source program

• For a load-and-go assembler, the actual address must be known


at assembly time, we can use an absolute program

108
Forward Reference in One-pass Assembler
For any symbol that has not yet been defined

• omit the address translation

• insert the symbol into SYMTAB, and mark this symbol undefined

• the address that refers to the undefined symbol is added to a list of


forward references associated with the symbol table entry

• when the definition for a symbol is encountered, the proper address for
the symbol is then inserted into any instructions previous generated
according to the forward reference list

109
Forward Reference in One-pass Assembler
• At the end of the program
• any SYMTAB entries that are still marked with * indicate
undefined symbols
• search SYMTAB for the symbol named in the END
statement and jump to this location to begin execution
• The actual starting address must be specified at
assembly time

110
Forward Reference in One-pass Assembler

111
Producing Object Code
• When definition of a symbol is encountered, the assembler must
generate another Text record with the correct operand address

• The loader is used to complete forward references that could not be


handled by the assembler

• The object program records must be kept in their original order when
they are presented to the loader

112
Multi-Pass Assemblers
• Restriction on EQU and ORG
• no forward reference, since symbols’ value can’t be defined
during the first pass
• It is unnecessary for a multi-pass assembler to make more than
two passes over the entire program.
• Instead, only the parts of the program involving forward references
need to be processed in multiple passes.
• The method presented here can be used to process any kind of
forward references.

113
Multi-Pass Assembler Implementation
• Use a symbol table to store symbols that are not totally defined
yet.
• For an undefined symbol, in its entry,
• We store the names and the number of undefined symbols
which contribute to the calculation of its value.
• We also keep a list of symbols whose values depend on the
defined value of this symbol.
• When a symbol becomes defined, we use its value to reevaluate
the values of all of the symbols that are kept in this list.
• The above step is performed recursively.

114
Forward Reference Example

&1 : one symbol is undefined


&2 : two symbols are undefined
115
Forward Reference Processing

But one symbol is unknown yet

Defined

Not defined yet


After first line

116
Forward Reference Processing

But two symbols are unknown yet

Now defined After second line

117
Forward Reference Processing

After third line

118
Forward Reference Processing

Start knowing values After 4’th line


119
Forward Reference Processing
Start knowing values

All symbols are


defined and their
values are known
now.

After 5’th line

120
Implementation Examples
• Microsoft MASM Assembler
• Sun Sparc Assembler
• IBM AIX Assembler

121
Microsoft MASM Assembler
• Assembler language program is written as a collection of
segments
• SEGMENT
• Each segment is defined as belonging to a particular class, CODE, DATA,
CONST, STACK
• registers: CS (code), SS (stack), DS, ES, FS, GS (Data)
• similar to program blocks in SIC
• ASSUME
• e.g. ASSUME ES:DATASEG2
• e.g. MOVE AX, DATASEG2
MOVE ES,AX
• similar to BASE in SIC

122
Microsoft MASM Assembler
• JUMP with forward reference
• near jump (within the code segment): 2 or 3 bytes
• far jump (to a different segment): 5 bytes
• e.g. JMP TARGET (not sure whether near / far jump)
• JMP FAR PTR TARGET
• JMP SHORT TARGET
• Pass 1: reserves 3 bytes for jump instruction
• phase error
• PUBLIC, EXTRN
• similar to EXTDEF, EXTREF in SIC

123
Sun Sparc Assembler
• Sections
• .TEXT (Executable instruction)
• .DATA (Initialized read/write data)
• .RODATA (Read only data)
• .BSS (Uninitialized data areas)
• Separate location counter is maintained for each section.
• Similar to program blocks in SIC
• Symbols
• global vs. weak
• similar to the combination of EXTDEF and EXTREF in SIC
• Delayed branches
• delayed slots (NOP)
• annulled branch instruction (A)

124
AIX Assembler
• Base relative addressing
• save instruction space, no absolute address
• base register table:
• general purpose registers can be used as base register
• easy for program relocation
• only data whose values are to be actual address needs to be modified
• e.g. USING LENGTH, 1
• USING BUFFER, 4
• Similar to BASE in SIC
• DROP

125
AIX Assembler
• Alignment
• instruction (2)
• data: halfword operand (2), fullword operand (4)
• Slack bytes
• .CSECT
• control sections: RO(read-only data), RW(read-write data),
PR(executable instructions), BS(uninitialized read/write data)
• dummy section

126
Summary
• Module 2
• Assemblers and basic functions
• A simple SIC assembler, algorithm (flowchart) and data
structures, writing object code and object program.
• Assembler Features

Machine Dependent Machine Independent

Instruction Program Literals Symbol Expression Program Control


Formats and Relocation (=) (EQU, (Relative, Blocks Sections and
Addressing (producing the ORG) Absolute) Program
Mode M record) Linking
(EXTDEF, EXTREF,
D record, R record)

127
Summary

• Assemblers design options

One Pass Multi Pass Assembler


Assembler

Generates
Load and go
Object Code

 Implementation Examples

128

You might also like