You are on page 1of 54

Assemblers

Elements of Assembly Language Programming,


Design of the Assembler,
Assembler Design Criteria,
Types of Assemblers,
Two-Pass Assemblers,
One-Pass Assemblers,
Single pass Assembler for Intel x86 ,
Algorithm of Single Pass Assembler,
Multi-Pass Assemblers,
Advanced Assembly Process,
Variants of Assemblers Design of two pass assembler
10/21/2007 1
Fundamental Functions
• Generate machine language
– Translate mnemonic operation codes to machine
code
• Assign addresses to symbolic labels used by
the programmer

10/21/2007 2
Elements of Assembly Language
Programming
• Mnemonics
– Also called as Opcodes.
– Helps error diagonistics
• Symbolic Operands
– Symbolic names can be associated with data or instructions.
– Assembler performs memory binding to these names.
• Data Declaration
– -5 (11111010)2
– 10.5 (41A80000)16

10/21/2007 3
Elements of Assembly Language
Programming- Statement Format
• [Label] <Opcode> <operand spec>[, <operand
spec> …]
– [..] means it is optional.
– Label is associated as a symbolic name with memory
word(s) generated for the statement.
– <operand spec> has the following syntax:
• <symbolic name> [+ <displacement>][(<index register>)]
• Ex: Operand
– AREA, AREA+5 AREA(4), AREA +5(4).

10/21/2007 4
Elements of Assembly Language
Programming- a simple assembly language
• Each statement has two operands
– First is always a register
• AREG, BREG, CREG and DREG.
– Second is a memory word using a symbolic name
and an optional displacement.
• BC Statement
– BC <condition code spec>, <memory address>

10/21/2007 5
Elements of Assembly Language
Programming- a simple assembly language
Instruction Opcode Assembly Mnemonic Remarks
00 STOP Stop execution
01 ADD FIRST OPERAND IS
02 SUB MODIFIED CONDITION
CODE IS SET
03 MULT
04 MOVER Register memory move
05 MOVEM memory register move

06 COMP Sets condition code


07 BC Branch on condition
08 DIV Analogous to SUB
09 READ First operand is not used
10 PRINT
10/21/2007 6
Elements of Assembly Language
Programming- a simple assembly language
• Machone instruction format

sign opcode reg operand memory operand


BC codes are : LT LE EQ GT GE ANY
CODES: 1 2 3 4 56 respectively

10/21/2007 7
Elements of Assembly Language
Programming- a simple assembly language
START 100
READ N 101) + 09 0 113
MOVER BREG, ONE 102) + 04 2 115
MOVEM BREG, TERM 103) + 05 2 116
AGAIN MULT BREG, TERM 104) + 03 2 116
MOVER CREG, TERM 105) + 04 3 116
ADD CREG, ONE 106) + 01 3 115
MOVEM CREG, TERM 107) + 05 3 116
COMP CREG, N 108) + 06 3 113
BC LE, AGAIN 109) + 07 2 104
MOVEM BREG, RESULT 110) + 05 2 114
PRINT RESULT 111) + 10 0 114
STOP 112) + 00 0 000
N DS 1 113)
RESULT DS 1 114)
ONE DC ‘1’ 115) + 00 0 001
TERM DS 1 116)
END

10/21/2007 8
Elements of Assembly Language Programming-
Assembly Language statements
• Three kinds of statements.
– Imperative statements
– Declaration Statements
– Assembler Directives.

10/21/2007 9
Elements of Assembly Language Programming-
Assembly Language statements

10/21/2007 Dr. Monther Aldwairi 10


Elements of Assembly Language
Programming- Assembly Language
statements
• Use of constants
• The statement DC does not really implement
constants. It simply initialize the memory words with
the given value.
• They may be changed by moving a new value to that
memory word.
• An assembly program can use constants in the sense
implemented in an HLL in two ways-
– As immediate operand
– As literals.
10/21/2007 11
Elements of Assembly Language
Programming- Assembly Language
statements
• A Literal is an operand with the syntax =‘<value>’.
• It differs from the constant because its location can
not be specified in the assembly program.
• This helps to ensure that its value is not changed
during the execution of the program.

• ADD AREG, =‘5’ ADD AREG, FIVE


----
FIVE DC ‘5’
10/21/2007 12
Elements of Assembly Language
Programming- Assemble Directives.
• Assembler directives instruct the assembler to
perform certain actions during the assembly of a
program.
– Ex.
• START <CONSTANT>
• END [<OPERAND SPEC>]

10/21/2007 13
Advantages of Assembly Language.
• The symbolic programming of Assembly Language is easier
to understand and saves a lot of time and effort of the
programmer.
• It is easier to correct errors and modify program instructions.
• Assembly Language has the same efficiency of execution as
the machine level language. Because this is one-to-one
translator between assembly language program and its
corresponding machine language program.
• To use specific architectural use of a computer , assembly
language is more helpful than HLL.

10/21/2007 14
Additional Functions
• Generate an image of what memory must look
like for the program to be executed.
• Interpret assembler directives (Pseudo-
Instructions)
– They provide instructions to the assembler
– They do not translate into machine code
– They might affect the object code

10/21/2007 15
Design of the Assembler &
Assembler Design Criteria
• The design of assembler can be to perform the
following: –
– Scanning (tokenizing)
– Parsing (validating the instructions)
– Creating the symbol table
– Resolving the forward references

10/21/2007 16
Design of the Assembler &
Assembler Design Criteria
• The design of assembler in other words:
– Convert mnemonic operation codes to their machine
language equivalents
– Convert symbolic operands to their equivalent machine
addresses
– Decide the proper instruction format Convert the data
constants to internal machine representations
– Write the object program and the assembly listing
• So for the design of the assembler we need to
concentrate on the machine architecture of the SIC/XE
machine. We need to identify the algorithms and the
various data structures to be used
10/21/2007 17
Design Specifications

• Identify the information necessary to perform a


task
• Design a suitable data structure to record the
information
• Determine the processing necessary to obtain
and maintain the information
• Determine the processing necessary to perform
the task
Analysis Phase
• The primary function is of building of the symbol table.
• Concept of “Memory Allocation”
• To implement memory allocation a data structure
called location counter (LC) is used.
• The LC is always made to contain the address of the
next memory word in the target program.
• It is initialized to the constant specified in the START
statement.
• To update the contents of LC, analysis phase needs to
know lengths of different instructions.
Synthesis Phase
• MOVER BREG, ONE
– Address of the memory word with which name ONE is
associated (depends on source program, so it must be
made available by the analysis phase)
– Machine op codes corresponding to the mnemonic
MOVER (not depends on source program, it depends
on the assembly language)
• Use two data structures:
– Symbol Table (name, address) –build by analysis phase
– Mnemonic Table (mnemonic, opcode, length)
Data structures of the assembler
Summary
• The tasks performed by the analysis phases are as
follows:
• Analysis Phase
1. Isolate the label, mnemonic opcode and operand fields of a
statement.
2. If a label is present, enter the pair (symbol, < LC contents>) in
a new entry of symbol table.
3. Check validity of the mnemonic code through a look-up in
the mnemonics table.
4. Perform LC processing i.e, update the value contained in LC
by considering the opcode and operands of the statement.
Summary
• The tasks performed by the synthesis phases
are as follows:
• Synthesis Phase
1. Obtain the machine opcode corresponding to the
mnemonics fro the mnemonics table.
2. Obtain address of a memory operand from the symbol
table.
3. Synthesize the machine instruction or the machine form
a constant, as the case may be.
Pass Structure of Assembler
• Two Pass Translation
– It can handle forward references easily.
– LC processing is performed in the first pass and symbol
defined in the program are entered into the symbol table.
– The second pass synthesizes the target form using the
address information found in the symbol table.
– In effect, the first pass performs analysis of the source
program while the second pass performs synthesis of the
target program.
– The first pass constructs a intermediate representation (IR)
of the source program for use by second phase.
Two pass assembly
Pass Structure of Assembler
• Single Pass Translation
– LC processing and construction of the symbol table
proceeds as in two pass translation.
– The problem of forward references is tacked using a
process called “backpatching”.
– The operand field of an instruction containing a forward
reference is left blank initially. The address of the forward
referenced symbol is put into this field when its definition
is encountered.
• MOVER BREG, ONE [ONE is forward reference]
– Table of Incomplete Instruction (TII)
• This entry is a pair (<instruction address>, <symbol>)
• e.g. (101,ONE) in this case.
Difficulties: Forward Reference
• Forward reference: reference to a label that is
defined later in the program.

Loc Label Operator Operand

1000 FIRST STL RETADR


1003 CLOOP JSUB RDREC
… … … … …
1012 J CLOOP
… … … … …
1033 RETADR RESW 1
Backpatching

• The problem of forward references is handled


using a process called backpatching
– Initially, the operand field of an instruction containing
a forward reference is left blank
– Ex: MOVER BREG, ONE can be only partially
synthesized since ONE is a forward reference
– The instruction opcode and address of BREG will be
assembled to reside in location 101
– To insert the second operand’s address later, an entry
is added as Table of Incomplete Instructions (TII)
– The entry TII is a pair (<instruction address>,
<symbol>) which is (101, ONE) here
Backpatching

• The problem of forward references is handled using a


process called backpatching
– When END statement is processed, the symbol table would
contain the addresses of all symbols defined in the source
program
– So TII would contain information of all forward references
– Now each entry in TII is processed to complete the instruction
– Ex: the entry (101, ONE) would be processed by obtaining the
address of ONE from symbol table and inserting it in the
operand field of the instruction with assembled address 101.
– Alternatively, when definition of some symbol L is encountered,
all forward references to L can be processed
Example
START 100
MOVER BREG, N LC = 100 (1 byte)
MULT BREG, N LC = 101 (1 byte)
STOP LC = 102 (1 byte)
N DS 5 LC = 103

Symbol Address

N 103
Advance Assembler Directives
• ORIGIN
– The syntax is ORIGIN <address spec>
– <address spec> is memory operand or constant.
• This directive indicates that LC should be set to the
address given by <address spec>.
• The ORIGIN is useful if the target program does not
consists of consecutive memory words.
• LC processing is performed in relative manner rather
than absolute manner.
EQU statement
• EQU has the syntax:
<symbol> EQU <address spec>

• The EQU statement defines the symbol to


represent the <address spec>

• LC processing is not done


LTORG
• The LTORG statement permits the
programmer to specify where the literals
should be placed.

• The Assembler allocates memory to the


literals in the literal pool after the end of
LTORG statements.
Two Pass Assembler
• Read from input line
– LABEL, OPCODE, OPERAND

Source
program

Intermediate Object
Pass 1 Pass 2
file codes

OPTAB SYMTAB SYMTAB


Design of a two pass assembler
• Pass 1
• Separate the symbol, opcode and operand
fields.
• Build the Symbol table.
• Perform LC processing
• Construct intermediate representation.

• Pass 2
• Synthesis the target program.
Pass I of an assembler
• It uses the following data structures:
– OPTAB –A table of mnemonic opcodesand related
information\
– SYMTAB –Symbol Table
– LITTAB –A table of literals used in the program
Data Structures of assembler Pass I
Algorithm-Assembler First Pass
Algorithm-Assembler First Pass
Intermediate Code Forms
• The intermediate code consists of a set of IC unit
consisting of the following three fields:

• Various forms of IC are there due to the trade off


between processing efficiency and memory
economy.
Mnemonics Field
• The mnemonic field contains pair of the form (statement class, code)
where statement class can be one of IS,DL and AD standing for imperative
statement, declarative statement and assembler directive, respectively.

fig code for declaration statements and directives


Intermediate Code for Imperative Statements
Variant II
Comparison of the variants
Comparison of the variants
Comparison of the variants
Comparison of the variants
Algorithm for Pass II
Algorithm for Pass II
Error Reporting of Assembler
Error Reporting of Assembler
Error Reporting of Assembler

You might also like