You are on page 1of 22

www.csetube.

in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :1

Introduction
An assembler is system software that accepts an assembly language program as its input
and produces its machine language equivalent along with information for the loader as its output.
It is a translator that converts the assembly language program into machine language program.

.in
The structure of the assembler is given as

Machine language

be
Assembly language
program and extra
Program Assembler information for
loading
tu
Data structures
se
(Ex) symbol table,
opcode table
.c

Assembly language program


The sequence of instructions to the assembler is called as assembly language program
w

that uses set of mnemonics.


The format of the instruction varies from system to system based on the machine
w

architecture. In SIC, the format of the assembly language instruction is given as


w

Label Opcode or Mnemonics Operands

(Ex) FIRST SLT RETADR


CLOOP JSUB RDREC
LDA LENGTH
RSUB

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period : 1
Machine language program
Each line of assembly language instruction is translated to machine language. The
machine language takes two forms depending on the architecture. They are
1. hexadecimal form

.in
2. binary form
SIC takes hexadecimal form of machine code.

be
Data structures
Assembler built or uses one or more data structure to perform the assembling process.
Some of the data structures are SYMTAB and optab.
tu
Basic assembler functions
se
Fundamental functions of an assembler
– A simple SIC assembler
.c

– Assembler algorithm and data structure


w

A simple SIC assembler


w

Mnemonic operation Machine


code language
w

Symbolic labels Machine


addresses

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :2
Assembler Directives
• Basic assembler directives (pseudo instructions)
– START
Specify name and starting address for the program

.in
– END
Indicate the end of the source program, and (optionally) the first executable
instruction in the program.

be
– BYTE
Generate character or hexadecimal constant, occupying as many bytes as
needed to represent the constant.
tu
– WORD
Generate one-word integer constant
se
– RESB
Reserve the indicated number of bytes for a data area

.c

RESW
Reserve the indicated number of words for a data area
w

SIC Assembler
w

• Assembler‟s task
– Convert mnemonic operation codes to their machine language equivalents
w

– Convert symbolic operands to their equivalent machine addresses


– Build machine instructions in proper format
– Convert data constants into internal machine representations (data formats)
– Write object program and the assembly listing

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :2
Forward Reference
• Definition
– A reference to a label that is defined later in the program
• Solution

.in
– Two passes
• First pass: scan the source program for label definition, address
accumulation, and address assignment

be
• Second pass: perform most of the actual instruction translation

tu
se
.c
w
w
w

Two Pass SIC Assembler


• Pass 1 (define symbols)
– Assign addresses to all statements in the program
– Save the addresses assigned to all labels for use in Pass 2
– Perform assembler directives, including those for address assignment, such as
BYTE and RESW
• Pass 2 (assemble instructions and generate object program)

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :3
– Assemble instructions (generate opcode and look up addresses)
– Generate data values defined by BYTE, WORD
– Perform processing of assembler directives not done during Pass 1
– Write the object program and the assembly listing

.in
Assembler algorithm and data structures
• Operation Code Table (OPTAB)

be
• Symbol Table (SYMTAB)
• Location Counter (LOCCTR)
tu OPTAB
se
Pass 1 Pass 2
.c

Source Intermediate file


program
w
w

SYMTAB
LOCCT Object program
R
w

OPTAB
• Contents:
– Mnemonic operation codes
– Machine language equivalents
– Instruction format and length
• During pass 1:

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :3
– Validate operation codes
– Find the instruction length to increase LOCCTR
• During pass 2:

.in
– Determine the instruction format
– Translate the operation codes to their machine language equivalents
• Implementation: a static hash table

be
LOCCTR
• A variable accumulated for address assignment, i.e., LOCCTR gives the address of the
associated label. tu
• LOCCTR is initialized to be the beginning address specified in the “start” statement.
• After each source statement is processed during pass 1, instruction length or data area is
se
added to LOCCTR.
SYMTAB

.c

Contents:
– Label name

w

Label address
– Flags (to indicate error conditions)
w

– Data type or length


• During pass 1:
w

– Store label name and assigned address (from LOCCTR) in SYMTAB


• During pass 2:
– Symbols used as operands are looked up in SYMTAB
• Implementation:
– a dynamic hash table for efficient insertion and retrieval
– Should perform well with non-random keys (LOOP1, LOOP2

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :4

Machine dependent assembler features


 Instruction Format and Addressing Mode
 Program relocation

.in
Instruction Format and Addressing Mode
» PC-relative or Base-relative addressing: op m

be
» Indirect addressing: op @m
» Immediate addressing: op #c
»
»
Extended format: +op m
Index addressing: op m,x
tu
» register-to-register instructions
se
» larger memory -> multi-programming (program allocation)
Translation
.c

 Register translation
» register name (A, X, L, B, S, T, F, PC, SW) and their values (0,1, 2, 3, 4, 5, 6, 8,
w

9)
» preloaded in SYMTAB
w

 Address translation
» Most register-memory instructions use program counter relative or base relative
w

addressing
» Format 3: 12-bit address field
– base-relative: 0~4095
– pc-relative: -2048~2047
» Format 4: 20-bit address field
– pc-relative first

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :5
Relative Addressing Modes
PC-relative
» e.g. 10 0000 FIRST STL RETADR 17202D
– displacement= RETADR - PC = 30-3 = 2D

.in
» e.g. 40 0017 J CLOOP 3F2FEC
– displacement= CLOOP - PC = 6 - 1A = -14 = FEC
Base-relative

be
» base register is under the control of the programmer
» e.g. 12 LDB #LENGTH
»
»
e.g. 13
e.g. 160
BASE
104E
LENGTH tu
STCH BUFFER, X 57C003
– displacement= BUFFER - B = 0036 - 0033 = 3
se
» NOBASE is used to inform the assembler that the contents of the base register no
longer be relied upon for addressing
.c

Address Translation
Immediate addressing
w

» e.g. 55 0020 LDA #3 010003


» e.g. 133 103C +LDT #4096 75101000
w

» e.g. 12 0003 LDB #LENGTH 69202D


– the immediate operand is the symbol LENGTH
w

– the address of this symbol LENGTH is loaded into register B


– LENGTH = 0033 = PC + displacement = 0006 + 02D
– if immediate mode is specified, the target address becomes the operand
Indirect addressing
» target addressing is computed as usual (PC-relative or BASE-relative)
» only the n bit is set to 1
e.g. 70 002A J @RETADR 3E2003

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :5
Program Relocation
Example Fig. 2.1
» Absolute program, starting address 1000
e.g. 55 101B LDA THREE 00102D

.in
» Relocate the program to 2000
e.g. 55 101B LDA THREE 00202D
» Each Absolute address should be modified

be
Example Fig. 2.5:
» tu
Except for absolute address, the rest of the instructions need not be modified
– not a memory address (immediate addressing)
– PC-relative, Base-relative
se
» The only parts of the program that require modification at load time are those that
specify direct addresses
.c
w
w
w

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :6
Relocatable Program
Modification record
» Col 1 M
» Col 2-7 Starting location of the address field to be modified, relative

.in
to the beginning of the program
» Col 8-9 length of the address field to be modified, in half- bytes

be
Object Code

tu
se
.c
w
w
w

10

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :7
Machine dependent assembler features
Assembler features not closely related to machine architecture
• Literals
• Symbol-defining statements
• Expressions

.in
• Program blocks
• Control sections and program linking

be
Literals
It is convenient for the programmer to be able to write the value of a constant operand as a part of
the instruction that uses it. Such an operand is called a literal.

45 001A ENDFIL LDA


tu =C„EOF‟ 003210
se
...
002D * =C„EOF‟ 454F46
215 1062 WLOOP TD =X„05‟ E32011
.c

...
1076 * =X„05‟ 05
w

• In this assembler language notation, a literal is identified with the prefix=, which is followed by a
w

specification of the literal value.


w

The difference between a literal and an immediate operand

 With immediate addressing, the operand value is assembled as a part of the machine
instruction.
 With a literal, the assembler generates the specified value as a constant at
some other memory location. The address of this generated constant is used
as the target address for the machine instruction.

11

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :7
Literal pool
All of the literal operands used in a program are gathered together into one or more literal
pools.

.in
Where the literal pool should be placed?
93 LTORG

be
002D * =C„EOF‟ 454F46
� The assembler directive LTORG tells the assembler to generate a literal pool here.
Literal for current value of location counter
tu
� The value of the location counter can be denoted by a literal operand *.
• BASE *
se
• LDB =*
Handling duplicate literal operands
 The assembler should avoid storing duplicate literals.
.c

 The easiest way to recognize duplicate literals is by comparison of the character


w

strings defining them.


• 100 LDA =C„EOF‟
w

• 125 LDA =C„EOF‟


• 160 LDA =X„454F46‟ Literal operands with different
w

literal names may have the same


literal values
Recognizing literal operands that have different literal names but have the same literal values will
complicate the design of an assembler.
Literal table (LITTAB)
 The basic data structure needed to process literal operands is a literal table (LITTAB).
 LITTAB is often organized as a hash table, using the literal name or value as the key.

12

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :8
Processing literal operands
Pass 1
• For each recognized literal operand, search LITTAB. If the literal is already present in the
table, no action is need; if it is not present, the literal is added to LITTAB without assigning

.in
its address.
• When a LTORG statement is encountered or the end of the program, the assembler makes a
scan of LITTAB and assigns an address to each literal.

be
• Update the location counter to reflect the number of bytes occupied by each literal.
Pass 2
• Search LITTAB for each literal operand encountered.
tu
• The data values specified by the literals in each literal pool are inserted at the appropriate
places in the object program.
se
• In the same way as these values generated by BYTE or WORD statements.
• If a literal value represents an address in the program, the assembler must generate the
.c

appropriate Modification record.


w
w
w

13

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :8
Symbol-defining statements
Assembler directives
 EQU
 ORG

.in
Assembler directive: EQU
 Most assemblers provide an assembler directive that allows the programmer to define

be
symbols and specify their values.
Symbol EQU value
 When the assembler encounters the EQU statement, it enters “symbol” into SYMTAB with
the value of “symbol”
Use of EQU
tu

se
Establish symbolic names that can be used for improved readability in place of numeric
values.
+LDT #4096
.c

MAXLEN EQU 4096


+LDT #MAXLEN
w

 Define mnemonic names for registers.


A EQU 0
w

X EQU 1
L EQU 2
w

 Establish and use names that reflect the logical function of the registers in the program.
BASE EQU R1
COUNT EQU R2
INDEX EQU R3

14

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :9
Assembler directive: ORG
 The assembler directive ORG is usually used to indirectly assign values to symbols.
ORG value
“Value” is a constant or an expression involving constants and previously defined

.in
symbols.
 When this statement is encountered, the assembler resets its location counter (LOCCTR) to
the specified value.

be
Use ORG for label definition
 Suppose that we want to define a table with the following structure.
STAB
100
SYMBOL
6 bytes
tu
VALUE
3 bytes 2 bytes
FLAGS

entries
se
 In some assemblers, the previous value of LOCCTR is automatically remembered, so we can
write
.c

ORG
to return to the normal use of LOCCTR.
w

Restrictions of EQU and ORG in an ordinary two-pass assembler.


w

For an ordinary two-pass assembler, all symbols must be defined during Pass 1. Hence, the
following sequences could not be processed by an ordinary two-pass assembler.
w

 All terms used to specify the value of the new symbol must have been defined previously in
the program.

BETA EQU ALPHA


ALPHA RESW 1
Disallowed

15

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :9

ALPHA EQU BETA


BETA EQU DELTA

.in
DELTA RESW 1 disallowed

ORG ALPHA

be
BYTE1 RESB 1
BYTE2 RESB 1
BYTE3 RESB 1 tu
ORG
ALPHA RESB 1 disallowed
se
.c

ALPHA RESW 1
w

BETA EQU ALPHA


Allowed
w
w

16

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :10

Expressions
 Most assemblers allow the use of expressions whenever a single operand such as a label or
literal is permitted.

.in
• Each such expression must be evaluated by the assembler to produce a single
operand address or value.

be
Assemblers generally allow arithmetic expressions formed according to the normal rule using
the operators +,-,*, and /.
• Individual terms in the expression may be
• constants,
tu
• user-defined symbols, or
• special terms.
se
• The most common special term is the current value of the location
counter (often designated by *)
.c
w

Types of terms
w

Absolute terms - The value of an absolute term is independent of program location.


Relative terms - The value of a relative term is dependent on the beginning address of the
w

program.
Types of expressions
By the type of value produced, expressions can classified as
1. Absolute expressions
• The value of an absolute expression is independent of the program location.
• The absolute expression may contains relative terms provided the

17

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :10
relative terms occur in pairs and the terms in each such pair have opposite signs. No relative term
can enter multiplication or division operation.
• e.g. MAXLEN EQU BUFEND-BUFFER
2. Relative expressions

.in
• The value of a relative expression is relative the beginning address of the object
program.
• A relative expression is one in which all of the relative terms except

be
one can be paired as described above. The remaining unpaired term must have a positive sign. No
relative term can enter multiplication or division operation.
Expressions that are neither relative nor absolute should be flagged by the assembler as errors.
tu
Determining types of expressions
se
Add this
Symbol Type Value field to
MAXLEN A 1000 SYMTAB
.c

BUFEND R 1036
BUFFER R 0036
w

RETADR R 0030
w

BUFEND+BUFFER, 100-BUFFER, and 3*BUFFER are neither relative expressions nor


absolute expressions.
w

Program blocks
Assembler directive: USE

USE indicates which portions of the source program belong to the various blocks.

18

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :11
Control section and program linking
Control section
i. A control section is a part of the program that maintains its identity after assembly.
ii. Each control section can be loaded and relocated independently of the others.

.in
iii. Different control sections are most often used for subroutines or other logical
subdivisions of a program.
Assembler directive: CSECT

be
CSECT: signal the start of a new control section.
The assembler establishes a separate location counter (initialized as 0) for each control section.

Assembler directives: EXTDEF, EXTREF


tu
EXTDEF: external definition
se
The EXTDEF statement in a control section names symbols, called external symbols that are
defined in this control section and may be used by other sections. Control section names are
.c

automatically considered to be external symbols.


w

EXTREF: external reference


The EXTREF statement names symbols that are used in this control section and are defined
w

elsewhere.
w

19

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :12
One pass assemblers and Multi pass assemblers
One-Pass Assemblers
 Scenario for one-pass assemblers
Generate their object code in memory for immediate execution – load-and-go

.in
assembler.
External storage for the intermediate file between two passes is slow or is
inconvenient to use

be
 Main problem - Forward references
i. Data items


ii.
Solution
Labels on instructions tu
i. Require that all areas be defined before they are referenced.
se
ii. It is possible, although inconvenient, to do so for data items.
iii. Forward jump to instruction items cannot be easily eliminated.
.c

Insert (label, address_to_be_modified) to SYMTAB


Usually, address_to_be_modified is stored in a linked-list
w

Forward Reference in One-pass Assembler


w

 Omits the operand address if the symbol has not yet been defined.
 Enters this undefined symbol into SYMTAB and indicates that it is undefined
w

 Adds the address of this operand address to a list of forward references associated with the
SYMTAB entry.
 When the definition for the symbol is encountered, scans the reference list and inserts the
address.
 At the end of the program, reports the error if there are still SYMTAB entries indicated
undefined symbols.

20

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :13
Multi-Pass Assemblers
 For a two pass assembler, forward references in symbol definition are not allowed:
ALPHA EQU BETA
BETA EQU DELTA

.in
DELTA RESW 1
 Symbol definition must be completed in pass 1.

be
Prohibiting forward references in symbol definition is not a serious inconvenience.
 Forward references tend to create difficulty for a person reading the program.

Implementation

tu
For a forward reference in symbol definition, we store in the SYMTAB:
se
i. The symbol name

ii. The defining expression


.c

iii. The number of undefined symbols in the defining expression

iv. The undefined symbol (marked with a flag *) associated with a list of symbols
w

depend on this undefined symbol.


w

v. When a symbol is defined, we can recursively evaluate the symbol expressions


w

depending on the newly defined symbol.

21

www.csetube.in
www.csetube.in
Lecture plan

Code & Name of subject: CS2304 System Software R/TP/01


Unit number & Name: II – ASSEMBLERS ISSUE: C REV: 00
Page 2of2

Period :13
IMPLEMENTATION EXAMPLE
 MASAM assembler
 SPARC assembler

.in
MASAM assembler

 MASAM assembler is written for Pentium and other x 86 systems.

be
 Since x 86 system views memory as a collection of segments, MASAM assembler

language program is written as a collection of segments.


tu
 Each segment is defined as belonging to a particular class.
se
 Commonly used classes are CODE, DATA, CONST and STACK.

 During program execution, segments are addressed via the x 86 segment registers.
.c

 Code segment are addressed using register CS


w

 Start segments are addressed using register SS


w

 Data segments are addressed using DS or GS.


w

 Jump instructions are assembled in two different ways „

1. near jump

2. far jump

22

www.csetube.in

You might also like