You are on page 1of 28

1.

Introduction
There are two main classes of programming languages: high level (e.g., C,
Pascal) and low level.

Assembly Language is a low level programming language. Programmers


code symbolic instructions, each of which generates machine instructions.

An assembler is a program that accepts as input an assembly language


program (also called as mnemonic code) and produces its machine language
equivalent (object code) along with the information for the loader.

Assembly Language Program


___________________ Assembler Linker exe
______________
_______________

Executable program generation from an assembly source code


Assembler languages-structure
<Label> <Mnemomic> <Operand> Comments

• Label
– symbolic labeling of an assembler address (command address at Machine level)
• Mnemomic
– Symbolic description of an operation
• Operands
– Contains of variables or addresse if necessary
• Comments
– ignored by assembler
– used by humans to document/understand programs
– tips for useful comments:
• avoid restating the obvious, as ‚decrement R1‛
• provide additional insight, as in ‚accumulate product in R6‛
Advantages of coding in assembly language are:
•Provides more control over handling particular hardware
components
•May generate smaller, more compact executable modules
•Often results in faster execution

Disadvantages:
•Not portable
•More complex
•Requires understanding of hardware details (interfaces)
DESIGN OF ASSEMBLER

•Assembler are designed using 2 – passes.


•The first pass defines the symbols and the literals
•The second pass generates the instructions and addresses
Source
program

Intermediate Object
Pass 1 Pass 2
file codes

OPTAB SYMTAB SYMTAB


ASSEMBLER PROCESS
ASSEMBLY
LANGUAGE PASS1 PASS2 EXECUTABLE
PROGRAM IMAGE

SYMBOL
TABLE

First Pass:
•scan program file
•find all labels and calculate the corresponding addresses; this is
called the symbol table
Second Pass:
•convert instructions to machine language, using information from
symbol table
General Design Procedure of Two Pass
Assembler
1. Specify the problem
2. Specify data structures
3. Define format of data structures
4. Specify algorithm
5. Look for modularity [capability of one program to be
subdivided into independent programming units.]
6. Repeat 1 through 5 on modules.
STATEMENT OF PROBLEM
EX:
JOHN START 0
FOUR USING *, 15
FIVE L 1, FIVE
TEMP A 1, FOUR
ST 1, TEMP
DC F ‘4’
DC F ‘5’
DS 1F
END
Intermediate steps in assembling a program
FIRST PASS SECOND PASS
RELATIVE
ADDRESS MNEMONIC RELATIVE MNEMONIC
(LC) INSTRUCTION ADDRESS INSTRUCTION
JOHN START 0
USING 8, 15
L 1, FIVE 0 L 1, -(0,15) 0 L 1, 16(0,15)
A 1, FOUR 4 A 1, -(0,15) 4 A 1, 12(0,15)
ST 1, TEMP 8 ST 1, -(0,15) 8 ST 1, 20(0,15)
FOUR DC F '4' 12 4 12 4
FIVE DC F '5' 16 5 16 5
TEMP DS 1F 20 - 20 -
JOHN START 0
START is a pseudo–op, which indicates the beginning of the program.
JOHN is the name of the program which is passed on to the loader by the assembler.
0 indicates the relative address of this program starts at 0

USING *,15
USING is a pseudo–op, that tells the assembler that register 15 is used as the base
register, and at execution time will contain the address of the first instruction of the
program i.e., 0.

L 1, FIVE
Represents LOAD instruction which is of type RX.
Equivalent mnemonic instruction after pass 1 is
L 1, - (0,15)
Where:
- Represents offset
0 Represents Index Register
15 Represents Base Register
A 1, FOUR
Represents ADD instruction which is of type RX.
The offset(offset usually denotes the number of address locations added to a base
address in order to get to a specific absolute address) of the FOUR is not known at this
point therefore the equivalent mnemonic instructions after pass 1 is:
A 1, - (0, 15)

ST 1, TEMP
Stores the contents of register 1 to TEMP.
It is of RX type instruction
The offset of TEMP is not known therefore the equivalent mnemonic instructions after
pass 1 is:
ST 1, -(0, 15)
FOUR DC F ‘4’
DC is a pseudo–op which defines a constant value 4 for FOUR.
It is been stored at relative locations 12.
One full word is reserved for this constant, therefore location counter is incremented by 4

FIVE DC F ‘5’

TEMP DS 1F
DS is a pseudo op which defines storage space for TEMP.
One full word is reserved for TEMP at relative location 20.

The Second Pass evaluates the fields and generate, the codes. i.e., it goes through the
program and fills the offset values.
Specify the problem

Pass1: Define symbols & literals.


1) Determine length of m/c instruction [MOTGET1]
2) Keep track of Location Counter [LC]
3) Remember values of symbols until pass 2 [STSTO]
4) Process some pseudo ops[EQU,DS etc] [POTGET1]
5) Remember Literals [LITSTO]

Outline of steps involved in pass 1 is given the following


flowchart.
Pass2: Generate object program
1) Look up value of symbols [STGET]
2) Generate instruction [MOTGET2]
3) Generate data (for DS, DC & literals)
4) Process pseudo ops [POTGET2]

An outline of the steps involved in pass 2 is given


in the following flowchart
Step 2. Data structure:-
Pass1: Databases
1. Input source program
2. ‚LC‛ location counter used to keep track of each instructions location.
3. M/c operation table (MOT) [indicates Symbolic mnemonic for each instruction & its
length(2, 4, 6 bytes)]
4. Pseudo operation table [POT], [indicates Symbolic mnemonic & action to be taken for
each pseudo op in pass 1]
5. Symbol Table (ST) to store each label & it’s corresponding value.
6. Literal Table (LT), to store each literal (variable) encountered & it’s corresponding
assigned location.
7. Copy of input to used later by PASS-2. This may be stored in a secondary storage device,
such as magnetic tape, disk as drum, or the original sources may be read by the assembler
a second time for pass 2.
• Pass2: Databases
1. Copy of source program input to Pass1.
2. Location Counter (LC)
3. MOT [Mnemonic, length, binary m/c op code, etc.] that indicates for each instruction,
a. Symbolic mnemonic
b. Length
c. Binary machine op code
d. Format (eg: R, S, RX, SI)
4. POT [indicates for each pseudo op the Mnemonic & action to be taken in Pass2]
5. ST [prepared by Pass1, containing each label & its corresponding value]
6. Base Table [or register table] indicates which registers are currently specified using ‘USING’ pseudo op
& what are contents.
7. Literal table prepared by Pass1. [Lit name & value].
8. A work – space, INST, that is used to hold each instruction as its various parts (eg: binary op-code,
registers fields, length fields, displacement fields) are being assembled together.
9. A work space, PRINT LINE used to produce a printed listing.
10. A work space, PUNCH CARD, used prior to actual outputting for converting the assembled instructions
into the format needed by the loader.
11. An output deck of assembled instructions in the format needed by the loader.
DATABASES USED BY PASS1 AND PASS2

Base Table
(BT)
Format of Data Structures
• Machine Operation Table
– The op-code is the key and it’s value is the binary op code
equivalent, which is used for use in generating machine code.
– The instruction length is stored for updating the location
counter.
– Instruction format is use in forming the m/c language
equivalent
Pseudo Operation Table

• It is a fixed table that contains the pseudo – ops (symbolic


mnemonics) and the corresponding actions.
• The POT for pass 1 and pass 2 are the same and it contains the
name and address.
• Size of POT is 8 bytes per entry.
• Each pseudo – op is listed with an associated point to the
assembler routine for processing the pseudo – op.
ST Symbol Table
• It is a variable table that contains labels and its value.
• ST for pass1 & pass2 are the same and contains symbol, value, length and
relocation fields.
• Size of table is 14 bytes per entry.
• The length field indicates the length in bytes of the instruction to which symbol
is attached.
• It is used by the assembler to calculate the length codes used with certain SS
type instructions.

Ex:
• H DS F
the length of H is 4, since the size of 1 full word is 4 bytes.
• AC EQY 2
the length of AC is 1 byte, because if a symbol is equivalent to another,
its length is made the same as that of the other.
The relative location indicator tells the assembler whether the value of the symbol is absolute (A)
[i.e., the value of the symbol does not change if the program is moved in the core] or relative (R) to
the base of the program.
Ex:
• AC EQU 2 The relocation field of AC is A
• FOUR DC F ‘4’ The relocation field of FOUR is R
LT Literal Table
• It is a variable table that contains literals and its value.
• It is same like symbol table, instead of symbols, here we have literals.

Ex:
A 1, = F ‘4’
The literal table entries for this is as follows:

Symbol (8 bytes) Value (4 bytes) Length (1 byte) Relocation (1 byte)


(characters) Hexadecimal Hexadecimal (character)
F ‘4’ 4 04 “R”
BT Literal Table
• It is used by the assembler to generate the proper base register reference in
machine instructions and to compute the correct offsets.
• In the assembly process, the assembler must generate an address, which
contains the offset, a base register number and an index register number.
• When generating an address, the assembler may use the base register table to
choose a base register that will contain a value closest to the symbolic
reference.

Base Register Number = the base register containing a value closest to the
symbolic reference.
Offset = value of symbol in symbol table – contents of base
register.
The following table summarizes the fixed and variable tables used by the 2 – pass
assembler
NAME OF
SL TYPE OF THE PASS 1 PASS 2
THE SIZE
NO TABLE ENTRIES ENTRIES
TABLE

Mnemonic Opcode, Mnemonic opcode,


1 MOT FIXED 6 bytes per entry
length length and format
Pseudo - op and Pseudo - op and
2 POT FIXED TABLE 8 bytes per entry
address address
symbol, value length Symbol, value length 14 bytes per
3 ST VARIABLE
and relocations and relocation entry

VARIABLE Literals, value length literal, value length 14 bytes per


4 LT
TABLE and relocations and relocations entry

VARIABLE Base register number and


5 BT 4 bytes per entry
TABLE contents of base register

You might also like