Professional Documents
Culture Documents
2
Introduction to Assemblers
• Fundamental functions
• Translating mnemonic operation codes to their machine language equivalents
• Assigning machine addresses to symbolic labels
• Machine dependency
• Different machine instruction formats and codes
Role of Assembler
Source Program Assembler Object Code Linker
Executable Code
Loader
3
Example Program
Purpose
• reads records from input device (code F1)
• at the end of the file, writes EOF on the output device, then RSUB to the
operating system
4
Example
Read Record into Buffer Read Record into Buffer and Subroutine
calculate length
LDX ZERO LDX ZERO RDREC:
RL: TD IP RL: TD IP LDX ZERO
JEQ RL JEQ RL RL: TD IP
RD IP RD IP JEQ RL
COMP ZERO COMP ZERO RD IP
JEQ EXIT JEQ EXIT COMP ZERO
STCH BUFFER, X STCH BUFFER, X JEQ EXIT
TIX MAX TIX MAX STCH BUFFER, X
JLT RL JLT RL TIX MAX
EXIT: EXIT:STX LENGTH JLT RL
EXIT:STX LENGTH
ZERO WORD 0 ZERO WORD 0 RSUB
IP BYTE X’7’ IP BYTE X’7’ ZERO WORD 0
BUFFER RESB 4096 BUFFER RESB 4096 IP BYTE X’7’
MAX WORD 4096 MAX WORD 4096 BUFFER RESB 4096
LENGTH RESW 1 MAX WORD 4096
LENGTH RESW 1
5
Example
Write Record from Buffer Subroutine
6
Example
COPY: RDREC: WRREC:
STL RADR LDX ZERO LDX ZERO
CL: JSUB RDREC RL: TD IP WL: TD OP
LDA LENGTH JEQ RL JEQ WL
COMP ZERO RD IP LDCH BUFFER, X
JEQ EF COMP ZERO WD OP
JSUB WRREC JEQ EXIT TIX LENGTH
J CL STCH BUFFER, X JLT WL
EF: LDA EOF TIX MAX RSUB
STA BUFFER JLT RL
LDA THREE EXIT:STX LENGTH OP BYTE X’3’
STA LENGTH RSUB
JSUB WRREC
LDL RADR IP BYTE X’7’
RSUB MAX WORD 4096
EOF BYTE C’EOF’
THREE WORD 3
ZERO WORD 0
RADR RESW 1
LENGTH RESW 1
BUFFER RESB 4096 7
Basic Assembler Functions
Assembler Directives
• Pseudo-Instructions
• Not translated into machine instructions
• Providing information to the assembler
8
SIC Assembly Program
Line numbers Mnemonic opcode
(for reference)
comments
Address labels
operands
9
SIC Assembly Program
Indicates comment lines
10
Assembler’s Function
• Convert mnemonic operation codes to their machine language
equivalents.
11
SIC Example Program
Assembler
Directive
Machine
address (hexa)
12
SIC Example Program
13
Object Program
• Header
Col.1 H
Col.2~7 Program name
Col.8~13 Starting address of object program (hex)
Col.14-19 Length of object program in bytes (hex)
• Text
Col.1 T
Col.2~7 Starting address in this record (hex)
Col.8~9 Length of object code in this record in bytes (hex)
Col.10~69 Object code (hex)
• End
Col.1 E
Col.2~7 Address of first executable instruction (hex)
14
Object Program
15
Difficulties: Forward Reference
• Forward reference: reference to a label that is defined later in the program.
16
Two Pass Assembler
• Pass 1
• Assign addresses to all statements in the program
• Save the values assigned to all labels for use in Pass 2
• Perform some processing of assembler directives
• Pass 2
• Assemble instructions
• Generate data values defined by BYTE, WORD
• Perform processing of assembler directives not done in Pass 1
• Write the object program and the assembly listing
17
Data Structures
• Operation Code Table (OPTAB)
• Symbol Table (SYMTAB)
• Location Counter(LOCCTR)
18
Location Counter (LOCCTR)
• A variable that is used to help in the assignment of addresses, i.e.,
LOCCTR gives the address of the associated label.
19
Operation Code Table (OPTAB)
• Contents:
• Mnemonic operation codes
• Machine language equivalents
• Instruction format and length
• During pass 1:
• Validate operation codes in source program
• Find the instruction length to increase LOCCTR
• During pass 2:
• Determine the instruction format
• Translate the operation codes to their machine language equivalents
• Implementation: a static hash table with mnemonic operation
code as key. (entries are not normally added to or deleted from
it)
• Hash table organization is particularly appropriate.
20
SYMTAB COPY
FIRST
1000
1000
CLOOP 1003
• Contents: ENDFIL 1015
• Label name EOF 1024
• Label address THREE 102D
ZERO 1030
• Flags (to indicate error conditions) RETADR 1033
• Data type or length LENGTH 1036
• During pass 1: BUFFER 1039
RDREC 2039
• Store label name and assigned address (from LOCCTR) in SYMTAB
• During pass 2:
• Symbols used as operands are looked up in SYMTAB
• Implementation:
• a dynamic hash table for efficient insertion and retrieval
• Should perform well with non-random keys (LOOP1, LOOP2).
21
Two Pass Assembler
Source
program
Intermediate Object
Pass 1 Pass 2
file codes
OPTAB SYMTAB
22
Assembler Pass 1
23
Assembler Pass 2
24
Assembler Design
• Machine Dependent Assembler Features
• instruction formats and addressing modes (SIC/XE)
• program relocation
• Machine Independent Assembler Features
• literals
• symbol-defining statements
• expressions
• program blocks
• control sections and program linking
25
2. Machine Dependent Assembler Features
The Differences Between the SIC and SIC/XE Programs
26
2. Machine Dependent Assembler Features
Instruction Format and Addressing Mode
• SIC/XE
• PC-relative or Base-relative addressing: op m
• Indirect addressing: op @m
• Immediate addressing: op #c
• Extended format: +op m
• Index addressing: op m,x
• register-to-register instructions
• larger memory -> multi-programming (program allocation)
27
A SIC/XE Program
28
A SIC/XE Program
29
A SIC/XE Program
30
A SIC/XE Program
31
Generate Relocatable Programs
Let the
assembled
program start at
address 0 so that
later it can be
easily moved to
any place in the
physical memory.
32
33
34
Relative Addressing Modes
35
PC or Base-Relative Modes
• Format 3: 12-bit displacement field (in total 3 bytes)
• Base-relative: 0~4095
• PC-relative: -2048~2047
• Format 4: 20-bit address field (in total 4 bytes)
• The displacement needs to be calculated so that when the
displacement is added to PC (which points to the following
instruction after the current instruction is fetched) or the base
register (B), the resulting value is the target address.
• If the displacement cannot fit into 12 bits, format 4 then needs
to be used.
• Bit e needs to be set 1 to indicate format 4.
• A programmer must specify the use of format 4 by putting a
+ before the instruction. Otherwise, it will be treated as an
error.
36
Base-Relative vs. PC-Relative
• The difference between PC and base relative addressing modes
is that the assembler knows the value of PC when it tries to use
PC-relative mode to assembles an instruction. However, when
trying to use base-relative mode to assemble an instruction,
the assembler does not know the value of the base register.
• Therefore, the programmer must tell the assembler the
value of register B.
• This is done through the use of the BASE directive.
• Also, the programmer must load the appropriate value into
register B by himself.
• Another BASE directive can appear later, this will tell the
assembler to change its notion of the current value of B.
• NOBASE can also be used to tell the assembler that no more
base-relative addressing mode should be used.
37
PC-Relative Example - 1
10 0000 FIRST STL RETADR 17202D
12 0003
op(6) n i x bpe disp (12)
(14)16 110010 (02D) 16
(0001 0111) (0010 0000) (2D) 16
(17) 16 (20) 16 (2D) 16
displacement= RETADR - PC = 30 - 3 = 2D
38
PC-Relative Example - 2
40 0017 J CLOOP 3F2FEC
45 001A …….
op(6) n i x bpe disp (12)
(3C)16 11 0 0 10 (FEC) 16
(0011 1111) (0010 1111) (EC) 16
(3F) 16 (2F) 16 (EC) 16
39
Base-Relative Example
40
Immediate Addressing Example - 1
55 0020 LDA #3 010003
( 74 )16 01 0 0 01 ( 01000 ) 16
41
Indirect Addressing Example
• The target address is computed as usual (either PC-relative
or BASE-relative)
• We only need to set the n bit to 1 to indicate that the content
stored at this location represents the address of the operand,
not the operand itself.
42
The Object Code
43
The Object Code
44
The Object Code
45
Program Relocation
• The SIC program specifies that it must be loaded at address
1000 for correct execution. This restriction is too inflexible for
the loader.
46
Why Program Relocation
• To increase the productivity of the machine
47
Absolute Program
• Program with starting address specified at assembly time
48
Absolute Program
49
What Needs to be Relocated
• Need to be modified:
• The address portion of those instructions that use absolute (direct)
addresses.
50
How to Relocate Addresses
• For Assembler
• For an address label, its address is assigned relative to the start of the
program (that’s why START 0)
• Produce a modification record to store the starting location and the
length of the address field to be modified.
• For loader
• For each modification record, add the actual beginning address of the
program to the address field at load time.
51
Relocatable Program
• Modification Record
Col.1 M
52
The Relocatable Object Code
53
Machine Independent Assembler Features
•Literals
•Symbol Defining Statement
•Expressions
•Program Blocks
•Control Sections and Program Linking
54
Machine Independent Assembler Features
• Features are not closely related to machine architecture.
• Common examples:
• Literals
• Symbol-defining statements
• Expressions
• Program blocks
• Control sections
55
Machine Independent Assembler Features
Literals
• Literal is equivalent to:
• Define a constant explicitly and assign an address label for it
• Use the label as the instruction operand
56
Machine Independent Assembler Features
Literals: Example
RLOOP TD INPUT
…….
……. RLOOP TD = X‘F1’
INPUT BYTE X‘F1’
57
57
Original Program
58
Program using Literals
59
59
Literals vs. Immediate Operands
• Immediate Operands
• The operand value is assembled as part of the machine instruction
55 0020 LDA #3 010003
• Literals
• The assembler generates the specified value as a constant at some other
memory location
• Literal pools
• Normally literals are placed into a pool at the end of the program
• In some cases, it is desirable to place literals into a pool at some other
location in the object program
• assembler directive LTORG
• reason: keep the literal operand close to the instruction
60
Object Program Using Literal
61
Original Program
62
Using Literal
63
Object Program Using Literal
64
Duplicate Literals
• Duplicate literals:
• The same literal used more than once in the program
• Only one copy of the specified value needs to be stored
65
Problem of Duplicate-Literal Recognition using
Character Strings
• There may be some literals that have the same name, but different values
• For example, the literal whose value depends on its location in the program
• The value of location counter denoted by *
BASE *
LDB =*
• The literal =* repeatedly used in the program has the same name, but
different values
66
Implementation of Literal
• Data structure: a literal table LITTAB
• Literal name
• Operand value and length
• Address
67
Implementation of Literal
• Pass 1
• As each literal operand is recognized
• Search the LITTAB for the specified literal name or value
• If the literal is already present, no action is needed
• Otherwise, the literal is added to LITTAB (store the name, value, and length,
but not address)
• As LTORG or END is encountered
• Scan the LITTAB
• For each literal with empty address field, assign the address and update the
LOCCTR accordingly
• Pass 2
• As each literal operand is recognized
• Search the LITTAB for the specified literal name or value
• Use the associated address as the operand of the instruction
• As LTORG or END is encountered
• insert the data values of the literals in the object program
• Modification record is generated if necessary
68
Symbol-Defining Statements
• How to define symbols and their values
• Address label
• The label is the symbol name and the assigned address is its value
FIRST STL RETADR
69
Use of EQU
• Improves program readability and makes it easier to find and change
constant values
+LDT #4096
MAXLEN EQU 4096
+LDT #MAXLEN
A EQU 0
X EQU 1
BASE EQU R1
INDEX EQU R2
70
Example of ORG
• Indirect value assignment:
ORG value
71
Use of ORG
72
Use of ORG
73
Use of ORG
Set the LOCCTR to STAB
Size of field
more meaningful
74
Forward-Reference Problem
• Forward reference is not allowed for EQU and ORG.
• That is, all terms in the value field must have been defined
previously in the program.
• The reason is that all symbols must have been defined during
Pass 1 in a two-pass assembler.
Allowed
Not allowed
75
Forward-Reference Problem
Not allowed
Not allowed
76
ORG Example
• Using EQU statements
77
Expressions
• A single term as an instruction operand can be replaced by an
expression.
78
Expressions
• Expressions consist of
• Operator
• +,-,*,/ (division is usually defined to produce an integer result)
• Individual terms
• Constants
• User-defined symbols
• Special terms, e.g., *, the current value of LOCCTR
79
Relocation Problem in Expressions
• Values of terms can be
• Absolute (independent of program location)
• constants
• Relative (to the beginning of the program)
• Address labels
• * (value of LOCCTR)
• Expressions can be
• Absolute
• Only absolute terms
• Relative terms in pairs with opposite signs for each pair
• Relative
• All the relative terms except one can be paired as described in
“absolute”. The remaining unpaired relative term must have a
positive sign.
• No relative terms may enter into a multiplication or division operation
• Expressions that do not meet the conditions of either “absolute” or
“relative” should be flagged as errors.
80
Expressions
• Expressions can be classified as absolute expressions or relative
expressions
MAXLEN EQU BUFEND-BUFFER
81
Absolute Expressions
82
Absolute Expressions
• Illegal expressions:
BUFEND+BUFFER
100-BUFFER
3*BUFFER
because they are not absolute values nor locations within the program
83
Absolute or Relative
• To determine the type of an expression, we must keep track of the types of all
symbols defined in the program.
• We need a “flag” in the SYMTAB for indication.
84
Program Blocks
• Program blocks
• refer to segments of code that are rearranged within a single object program
unit
USE [blockname]
• If no USE statements are included, the entire program belongs to this single
block
• Each program block may actually contain several separate segments of the
source program
85
Program Block Example
Default block.
86
Program Block Example
87
Program Blocks - Implementation
Pass 1:
• Maintain a separate location counter for each program block.
• The location counter for a block is initialized to 0 when the block
first begins.
• The current value of this location counter is saved when
switching to another block, and the saved value is restored when
resuming a previous block.
• Thus, during pass 1, each label is assigned an address that is
relative to the beginning of the block that contains it.
• After pass 1, the latest value of the location counter for each
block indicates the length of that block.
• The assembler then can assign to each block a starting address
in the object program.
88
Program Blocks - Implementation
• Pass 2
• When generating object code, the assembler needs the address for each
symbol relative to the start of the object program (not the start of an
individual problem block)
• This can be easily done by adding the location of the symbol (relative to the
start of its block) to the assigned block starting address.
89
Example
There is no block
number for MAXLEN.
This is because
MAXLEN is an
absolute symbol.
90
Symbol Table
Consider the symbol LENGTH with relative address 0003 in program block 1 (CDATA).
Starting address for CDATA is 0066.
TA = 0003+0066=0069.
Displacement = TA – PC = 0069 – 0009 = 60
92
Pass 1 of program blocks
Modify the
assembler Pass
1 algorithm to
handle program
blocks
93
Pass 2 of program blocks
Modify the
assembler Pass
2 algorithm to
handle program
blocks
94
Control Sections and Program Linking
Control Sections
CSECT
95
Control Sections and Program Linking
Program Linking
• Instructions in one control section may need to refer to instructions
or data located in another control section.
• Thus, program (actually, control section) linking is necessary.
• Because control sections are independently loaded and relocated,
the assembler is unable to know a symbol’s address at assembly
time. This job can only be delayed and performed by the loader.
• We call the references that are between control sections “external
references”.
• The assembler generates information for each external reference
that will allow the loader to perform the required linking.
96
External Definition and References
• External definition
EXTDEF name [,name]
• EXTDEF names symbols that are defined in this control section and may be
used by other sections
• External reference
EXTREF name [,name]
• EXTREF names symbols that are used in this control section and are defined
elsewhere
97
Control Section Example
98
A new control section
99
A new control section
100
Implementation
• The assembler must include information in the object program that will cause the
loader to insert proper values where they are required
Define record
Col. 1 D
Col. 2-7 Name of external symbol defined in this
control section
Col. 8-13 Relative address within this control
section (hexadecimal)
Col.14-73 Repeat information in Col. 2-13 for
other external symbols
Refer record
Col. 1 R
Col. 2-7 Name of external symbol referred to in
this control section
Col. 8-73 Name of other external reference symbols
101
Modification Record
• The control section name is automatically an external symbol, i.e. it is available
for use in Modification records.
Modification record
Col. 1 M
Col. 2-7 Starting address of the field to be
modified (hexadecimal)
Col. 8-9 Length of the field to be modified, in
half-bytes (hexadecimal)
Col. 10 Modification flag (+ or -)
Col.11-16 External symbol whose value is to be
added to or subtracted from the indicated
field
102
Object Program
103
Assembler Design Options
104
One Pass Assembler
• Main problem: forward references
• data items
• labels on instructions
• Solution
• data items: require all such areas be defined before they are
referenced
• labels on instructions: no good solution
105
Program Example
106
One Pass Assembler
• Two types of one-pass assembler
• load-and-go
• produces object code directly in memory for immediate execution
• No loader is needed
• Can save time for scanning the source code again
• the other
• produces usual kind of object code for later execution
107
Load-and-go Assembler
• Characteristics
108
Forward Reference in One-pass Assembler
For any symbol that has not yet been defined
• insert the symbol into SYMTAB, and mark this symbol undefined
• when the definition for a symbol is encountered, the proper address for
the symbol is then inserted into any instructions previous generated
according to the forward reference list
109
Forward Reference in One-pass Assembler
• At the end of the program
• any SYMTAB entries that are still marked with * indicate
undefined symbols
• search SYMTAB for the symbol named in the END
statement and jump to this location to begin execution
• The actual starting address must be specified at
assembly time
110
Forward Reference in One-pass Assembler
111
Producing Object Code
• When definition of a symbol is encountered, the assembler must
generate another Text record with the correct operand address
• The object program records must be kept in their original order when
they are presented to the loader
112
Multi-Pass Assemblers
• Restriction on EQU and ORG
• no forward reference, since symbols’ value can’t be defined
during the first pass
• It is unnecessary for a multi-pass assembler to make more than
two passes over the entire program.
• Instead, only the parts of the program involving forward references
need to be processed in multiple passes.
• The method presented here can be used to process any kind of
forward references.
113
Multi-Pass Assembler Implementation
• Use a symbol table to store symbols that are not totally defined
yet.
• For an undefined symbol, in its entry,
• We store the names and the number of undefined symbols
which contribute to the calculation of its value.
• We also keep a list of symbols whose values depend on the
defined value of this symbol.
• When a symbol becomes defined, we use its value to reevaluate
the values of all of the symbols that are kept in this list.
• The above step is performed recursively.
114
Forward Reference Example
Defined
116
Forward Reference Processing
117
Forward Reference Processing
118
Forward Reference Processing
120
Implementation Examples
• Microsoft MASM Assembler
• Sun Sparc Assembler
• IBM AIX Assembler
121
Microsoft MASM Assembler
• Assembler language program is written as a collection of
segments
• SEGMENT
• Each segment is defined as belonging to a particular class, CODE, DATA,
CONST, STACK
• registers: CS (code), SS (stack), DS, ES, FS, GS (Data)
• similar to program blocks in SIC
• ASSUME
• e.g. ASSUME ES:DATASEG2
• e.g. MOVE AX, DATASEG2
MOVE ES,AX
• similar to BASE in SIC
122
Microsoft MASM Assembler
• JUMP with forward reference
• near jump (within the code segment): 2 or 3 bytes
• far jump (to a different segment): 5 bytes
• e.g. JMP TARGET (not sure whether near / far jump)
• JMP FAR PTR TARGET
• JMP SHORT TARGET
• Pass 1: reserves 3 bytes for jump instruction
• phase error
• PUBLIC, EXTRN
• similar to EXTDEF, EXTREF in SIC
123
Sun Sparc Assembler
• Sections
• .TEXT (Executable instruction)
• .DATA (Initialized read/write data)
• .RODATA (Read only data)
• .BSS (Uninitialized data areas)
• Separate location counter is maintained for each section.
• Similar to program blocks in SIC
• Symbols
• global vs. weak
• similar to the combination of EXTDEF and EXTREF in SIC
• Delayed branches
• delayed slots (NOP)
• annulled branch instruction (A)
124
AIX Assembler
• Base relative addressing
• save instruction space, no absolute address
• base register table:
• general purpose registers can be used as base register
• easy for program relocation
• only data whose values are to be actual address needs to be modified
• e.g. USING LENGTH, 1
• USING BUFFER, 4
• Similar to BASE in SIC
• DROP
125
AIX Assembler
• Alignment
• instruction (2)
• data: halfword operand (2), fullword operand (4)
• Slack bytes
• .CSECT
• control sections: RO(read-only data), RW(read-write data),
PR(executable instructions), BS(uninitialized read/write data)
• dummy section
126
Summary
• Module 2
• Assemblers and basic functions
• A simple SIC assembler, algorithm (flowchart) and data
structures, writing object code and object program.
• Assembler Features
127
Summary
Generates
Load and go
Object Code
Implementation Examples
128