You are on page 1of 64

Lecture-15

Assembler, Loader and Linker


What is Assembler?
• An assembler is a program that takes basic
computer instructions and converts them into
a pattern of bits that the computer's processor
can use to perform its basic operations.
• Some people call these instructions assembler
language and others use the term assembly
language.
How it Works
• The programmer can write a program using a sequence of these
assembler instructions.
• This sequence of assembler instructions, known as the source code or
source program, is then specified to the assembler program when that
program is started.
• The assembler program takes each program statement in the source
program and generates a corresponding bit stream or pattern (a series of
0's and 1's of a given length).
• The output of the assembler program is called the object code or object
program relative to the input source program.
• The sequence of 0's and 1's that constitute the object program is
sometimes called machine code.
• The object program can then be run (or executed) whenever desired.
Assemblers
• Assembler
– Converts assembly language programs into object files
– Object files contain a combination of machine instructions, data, and
information needed to place instructions properly in memory
• Assemblers need to
– translate assembly instructions and pseudo-instructions into machine
instructions
– Convert decimal numbers, etc. specified by programmer into binary
• Typically, assemblers make two passes over the assembly file
– First pass: reads each line and records labels in a symbol table
– Second pass: use info in symbol table to produce actual machine code
for each line
Object file format
Object File Text Data Relocation Symbol Debugging
Header Segment Segment Information Table information

• Object file header describes the size and position of the other pieces of
the file
• Text segment contains the machine instructions
• Data segment contains binary representation of data in assembly file
• Relocation info identifies instructions and data that depend on absolute
addresses
• Symbol table associates addresses with external labels and lists
unresolved references
• Debugging info
Assembler directive
• Assembler directives are pseudo instructions
– They provide instructions to the assembler itself
– They are not translated into machine operation codes
• The SIC assembler language has the following assembler directives.
– START Specify name and staring address for the program
– END Indicate the end of the source program and (optionally) specify
the first executable instruction in the program
– BYTE Generate character or hexadecimal constant, occupying as many
bytes as needed to represent the constant
– WORD Generate one-word integer constant
– RESB Reserve the indicated number of bytes for a data area
– RESW Reserve the indicated number of words for a data area
– End of record : a null char (00)
– End of file : a zero-length record
Example of a SIC assembler
language program

Forward
Reference
Example of a SIC assembler
language program (cont’d)
Lines beginning with “.” contain comments only.
Translation of source program to
object code
Require to accomplish the following functions
• Convert mnemonic operation codes to their machine language equivalents
– e.g. translate STL to 14
– process assembler directives
• Convert symbolic operands to their equivalent machine addresses
– e.g. translate RETADR to 1033
– handle forward references
• two passes
– the first pass scans the source program for label definitions and assigns
addresses
– the second performs most of the actual translation.
• Build the machine instructions in the proper format
• Convert the data constants specified in the source program into their internal machine
representation
– e.g. translate EOF to 454F46
• Write the object program and the assembly listing
– Object program format
Program with object codes
STL

RETADR

A large memory space


Format of object program
• Header record
Col. 1 H
Col. 2~7 Program name
Col. 8~13 Starting address of object program (hex)
Col. 14-19 Length of object program in bytes (hex)
• Text record
Col. 1 T
Col. 2~7 Starting address for object code in this record (hex)
Col. 8~9 Length of object code in this record in bytes (hex)
Col. 10~69 Object code, represented in hex (2 col. per byte)
• End record
Col.1 E
Col.2~7 Address of first executable instruction in object program (hex)
• “^” is only for separation only
Object program

No object code corresponds to addresses 1033-2038


This storage is reserved by the loader for use by the
program during execution.
A simple two-pass assembler
• Pass 1 (define symbols)
– Assign addresses to all statements in the program.
– Save the values(addresses) assigned to all labels for use in Pass 2.
– Perform some processing of assembler directives.
• Include processing that affects address assignment such as determining the length
of data areas defined by BYTE, RESW, etc.
• Pass 2 (assemble instructions and generate object program)
– Assemble instructions.
– translate operation codes
– look up addresses
• Generate data values defined by BYTE, WORD, etc.
• Perform processing of assembler directives not done during Pass 1.
• Write the object program and the assembly listing.
Internal data structures

• the Operation Code Table (OPTAB)


– OPTAB is used to look up mnemonic operation codes
and translate them to their machine language
equivalents.
• the Symbol Table (SYMTAB)
– SYMTAB is used to store values (addresses) assigned to
labels.
• a Location Counter (LOCCTR)
– This is a variable that is used to help in the assignment
of address.
Internal Data Structure

Intermediate
file

Pass 1 of Pass 2 of Object


Source File Program
Assembler Assembler

OPTAB

LOCCTR SYMTAB
OPTAB
• In most cases, OPTAB is a static table.
• OPTAB must contain the mnemonic operation
code and its machine language equivalent
• In more complex assemblers, OPTAB also
contains information about instruction format
and length.
• OPTAB is usually organized as a hash table,
with mnemonic operation code as the key.
SYMTAB
• A symbol is basically a name and an address. 
• Symbol table holds information needed to locate and
relocate a program’s symbolic definitions and references.
• This table may also contain information, such as type or
length, about the data area or instruction labeled.
• The symbol table contains an array of symbol entries.
• SYMTAB is usually organized as a hash table for efficiency
of insertion and retrieval.
– the label is the key of SYMTAB.
• non-random key
LOCCTR
• LOCCTR is a variable.
• LOCCTR is initialized to the beginning address
specified in the START statement.
• After each source statement is processed, the
length of the assembled instruction or data area
to be generated is added to LOCCTR.
• When a label is reached, the current value of
LOCCTR gives the address to be associated with
that label.
Intermediate file
• Pass 1 usually generates an intermediate file that
contains
– each source statement together with its assigned address,
error indicators, etc.
• This file is used as the input to Pass 2.
• This file retains some results of operations performed
during Pass 1
– the scanned operand field for symbols and addressing flags
– pointers into OPTAB and SYMTAB for each operation code
and symbol used.
Algorithm for Pass 1
Algorithm for Pass 1 (cont’d)
Algorithm for Pass 2
Algorithm for Pass 2 (Cont’d)
Linker
• It takes one or more object files or libraries as input and combines
them to produce a single (usually executable) file.
• In doing so, it resolves references to external symbols, assigns final
addresses to procedures/functions and variables, and revises code
and data to reflect new addresses (a process called relocation).
• Tool that merges the object files produced by separate compilation or
assembly and creates an executable file
• Three tasks
– Searches the program to find library routines used by program, e.g. printf(),
math routines,…
– Determines the memory locations that code from each module will occupy
and relocates its instructions by adjusting absolute references
– Resolves references among files
Process for producing an
executable file

Source file Assembler Object file

Executable
Object file Linker
Source file Assembler File

Source file Assembler Object file Program


Library
Three processes to run an object program
• Loading
– Brings object program into memory
• Relocation
– Modifies the object program so that it can be loaded at an
address different from the location originally specified
• Linking
– Combines two or more separate object programs and supplies
information needed to allow cross-references.
• “Loader and linker” may be a single system program
– Loader: loading and relocation
– Linker: linking Linking Loader
Object file and Symbols
• Computer programs typically comprise several parts or
modules; these parts/modules need not all be contained
within a single object file, and in such case refer to each
other by means of symbols.
• Typically, an object file can contain three kinds of symbols:
– defined symbols, which allow it to be called by other modules,
– undefined symbols, which call the other modules where these
symbols are defined, and
– local symbols, used internally within the object file to facilitate
relocation.
Loader
• Part of the OS that brings an executable file residing on
disk into memory and starts it running
• Steps
– Read executable file’s header to determine the size of text and
data segments
– Create a new address space for the program
– Copies instructions and data into address space
– Copies arguments passed to the program on the stack
– Initializes the machine registers including the stack ptr
– Jumps to a startup routine that copies the program’s arguments
from the stack to registers and calls the program’s main routine
Types of Loader
Types of loaders,
– Bootstrap loader
– absolute.
– Relocating
Bootstrap loader
• When a computer is turned on or restarted, a special
type of absolute loader, called bootstrap loader, is
executed.
• The bootstrap loader loads the first program to be run
by the computer – usually an operating system, from
the boot disk (e.g., a hard disk or a floppy disk).
• The bootstrap itself begins at address 0.
• It loads the OS starting address 0x80.
• No header record or control information, the object
code is consecutive bytes of memory.
The absolute loader

– The loader loads the file into memory at the


location specified by the beginning portion
(header) of the file, then passes control to the
program.
– If the memory space specified by the header is
currently in use, execution cannot proceed, and
the user must wait until the requested memory
becomes free.
Absolute loader
• It is very simple.
• All operations are accomplished in a single pass.
• An object program is loaded at the address
specified on the START directive.
• No relocation or linking is needed.
• The loader jumps to the address specified on
the END directive to begin execution of the
loaded program.
Absolute loader

• No linking and relocation needed


• Records in object program perform
• Header record
– Check the Header record for program name, starting
address, and length (available memory)
• Text record
– Bring the object program contained in the Text record to the
indicated address
• End record
– Transfer control to the address specified in the End record
Loading an absolute program
Loading an absolute program
Algorithm for an absolute loader
begin
read Header record
verify program name and length
read first Text record
while record type ≠ ‘E’ do
begin
{ if object code is in character form,
convert into internal representation }
move object code to specified location in memory
read next object program record
end
jump to address specified in End record
end
Disadvantages of the scheme of
absolute loaders
• This scheme needs the programmer to specify
the actual address at which it will be loaded
into memory.
– This does not create difficulty, if one program to
run, but not for several programs.
• This scheme make it difficult to use subroutine
libraries efficiently.
– the subroutines must be pre-assigned absolute
addresses.
The relocating loader
• The concept of program relocation is, the execution of the object
program using any part of the available and sufficient memory.
• The relocating loader will load the program anywhere in memory,
altering the various addresses as required to ensure correct referencing.
• The decision as to where in memory the program is placed is done by the
Operating System, not the programs header file.
• The actual starting address of the object program is not known until load
time.
• Relocation provides the efficient sharing of the machine with larger
memory and when several independent programs are to be run together.
• It also supports the use of subroutine libraries efficiently.
• Loaders that allow for program relocation are called relocating loaders
or relative loaders.
Methods for specifying relocation
• Use of modification record and, use of relocation
bit, are the methods available for specifying
relocation.
• In the case of modification record, a modification
record M is used in the object program to specify any
relocation.
• In the case of use of relocation bit, each instruction
is associated with one relocation bit and, these
relocation bits in a Text record is gathered into bit
masks.
Relocation Loader Using Modification Record

• In the object program, there is one Modification


record for each value that must be changed during
relocation.
• Each modification record specifies the starting
address and length of the field whose value is to be
altered.
Col. 1 M
Col. 2-7 Starting location of the address field to be
modified, relative to the beginning of the program.
Col. 8-9 Length of the address field to be modified
Object program with relocation
by Modification record
Relocation loader using relocation bit
• A relocation bit associated with each word of object
code is used to indicate whether or not this word
should be changed when the program is relocated.
• The relocation bits are gathered together into a bit
mask following the length indicator in each Text record.
– If the relocation bit corresponding to a word of object code is
set to 1, the program’s starting address is to be added to this
word when the program is relocated.
– A bit value of 0 indicates that no modification is necessary.
Relocation loader using relocation bit
• This is specified in the columns 10-12 of text record (T), the format of text
record, along with relocation bits is as follows.
Text record
col 1: T
col 2-7: starting address
col 8-9: length (byte)
col 10-12: relocation bits
col 13-72: object code
• Twelve-bit mask is used in each Text record (col:10-12 – relocation bits), since
each text record contains less than 12 words, unused words are set to 0, and,
any value that is to be modified during relocation must coincide with one of
these 3-byte segments.
• For absolute loader, there are no relocation bits column 10-69 contains
object code.
• The object program with relocation by bit mask is as shown below. Observe
FFC - means all ten words are to be modified and, E00 - means first three
records are to be modified.
Object program with relocation
by bit mask
Program Linking
• The Goal of program linking is to resolve the problems
with external references (EXTREF) and external
definitions (EXTDEF) from different control sections.
• EXTDEF (external definition) - The EXTDEF statement
in a control section names symbols, called external
symbols, that are defined in this (present) control
section and may be used by other sections.
• EXTREF (external reference) - The EXTREF statement
names symbols used in this (present) control section
and are defined elsewhere.
How to implement EXTDEF and EXTREF
• The assembler must include information in the object program that will
cause the loader to insert proper values where they are required – in the
form of Define record (D) and, Refer record(R)
Define record
• The format of the Define record (D) along with examples is as shown
here.
Col. 1 D
Col. 2-7 Name of external symbol defined in this control section
Col. 8-13 Relative address within this control section (hexadecimal)
Col.14-73 Repeat information in Col. 2-13 for other external symbols
• Example records
D LISTA 000040 ENDA 000054
D LISTB 000060 ENDB 000070.
Refer record(R)
Refer record
• The format of the Refer record (R) along with examples is
as shown here.
Col. 1 R
Col. 2-7 Name of external symbol referred to in this control
section
Col. 8-73 Name of other external reference symbols
• Example records
R LISTB ENDB LISTC ENDC
R LISTA ENDA LISTC ENDC
R LISTA ENDA LISTB ENDB
Example To Understand Program Linking &
Relocation
• Here are the three programs named as PROGA, PROGB
and PROGC, which are separately assembled and each
of which consists of a single control section.
• LISTA, ENDA in PROGA, LISTB, ENDB in PROGB and
LISTC, ENDC in PROGC are external definitions in each
of the control sections.
• Similarly LISTB, ENDB, LISTC, ENDC in PROGA, LISTA,
ENDA, LISTC, ENDC in PROGB, and LISTA, ENDA, LISTB,
ENDB in PROGC, are external references.
• Observe the object programs, which contain D and R records along with other records for
PROGA
H PROGA 000000 000063
D LISTA 000040 ENDA 000054
R LISTB ENDB LISTC ENDC
.
.
T 000020 0A 03201D 77100004 050014
.
.
T 000054 0F 000014 FFFF6 00003F 000014 FFFFC0
M000024 05+LISTB
M000054 06+LISTC
M000057 06+ENDC
M000057 06 -LISTC
M00005A 06+ENDC
M00005A 06 -LISTC
M00005A 06+PROGA
M00005D 06-ENDB
M00005D 06+LISTB
M000060 06+LISTB
M000060 06-PROGA
E000020
• Observe the object programs, which contain D and R records along with other records for PROGB
H PROGB 000000 00007F
D LISTB 000060 ENDB 000070
R LISTA ENDA LISTC ENDC
.
T 000036 0B 03100000 772027 05100000
.
T 000007 0F 000000 FFFFF6 FFFFFF FFFFF0 000060
M000037 05+LISTA
M00003E 06+ENDA
M00003E 06 -LISTA
M000070 06 +ENDA
M000070 06 -LISTA
M000070 06 +LISTC
M000073 06 +ENDC
M000073 06 -LISTC
M000073 06 +ENDC
M000076 06 -LISTC
M000076 06+LISTA
M000079 06+ENDA
M000079 06 -LISTA
M00007C 06+PROGB
M00007C 06-LISTA
E
• Observe the object programs, which contain D and R records along with other records for PROGC
H PROGC 000000 000051
D LISTC 000030 ENDC 000042
R LISTA ENDA LISTB ENDB
.
T 000018 0C 03100000 77100004 05100000
.
T 000042 0F 000030 000008 000011 000000 000000
M000019 05+LISTA
M00001D 06+LISTB
M000021 06+ENDA
M000021 06 -LISTA
M000042 06+ENDA
M000042 06 -LISTA
M000042 06+PROGC
M000048 06+LISTA
M00004B 06+ENDA
M00004B 006-LISTA
M00004B 06-ENDB
M00004B 06+LISTB
M00004E 06+LISTB
M00004E 06-LISTA
E
Three Program Appear After Loading and
Linking in Memory
Relocation and linking operations
performed on REF4 from PROGA
• For example, the value for REF4 in PROGA is located at
address 4054 (the beginning address of PROGA plus 0054,
the relative address of REF4 within PROGA).
• The following figure shows the details of how this value is
computed.
• The initial value from the Text record
T0000540F000014FFFFF600003F000014FFFFC0 is 000014. To this is
added the address assigned to LISTC, which is 4112 (the beginning
address of PROGC plus 30). The result is 004126.
• That is REF4 in PROGA is ENDA-LISTA+LISTC=4054-
4040+4112=4126.
• Similarly the load address for symbols LISTA:
PROGA+0040=4040, LISTB: PROGB+0060=40C3 and LISTC:
PROGC+0030=4112
Algorithm and Data structures for a Linking
Loader
• Linking Loader uses two-passes logic. ESTAB
(external symbol table) is the main data
structure for a linking loader.
– Pass 1: Assign addresses to all external symbols
– Pass 2: Perform the actual loading, relocation,
and linking
Main data structure for a linking
loader (cont’d)

• An external symbol table ESTAB,


– It is used to store the name and address of each external symbol in the set
of control section being loaded.
– A hash organization is often used for ESTAB.
• A variable, program load address, PROGADDR
– It is the beginning address in memory where the linked program is to be
loaded.
– Its value is supplied by the operating system.
• A variable, control section address, CSADDR
– It contains the starting address assigned to the control section currently
being scanned by the loader.
– This value is added to all relative addresses within the control section to
convert them to actual addresses.
• ESTAB - ESTAB for the example (refer three programs PROGA PROGB and
PROGC) given is as shown below.
• The ESTAB has four entries in it; they are name of the control section, the
symbol appearing in the control section, its address and length of the
control section.

Control Section Symbol Address Length


PROGA 4000 63
LISTA 4040
ENDA 4054
PROGB 4063 7F
LISTB 40C3
ENDB 40D3
PROGC 40E2 51
LISTC 4112
ENDC 4124
Program Logic for Pass 1
• Pass 1 assign addresses to all external symbols.
• In Pass 1, concerned only Header and Defined records.
• The variables & Data structures used during pass 1 are,
PROGADDR (program load address) from OS, CSADDR
(control section address), CSLTH (control section length)
and ESTAB.
• CSADDR+CSLTH = the next CSADDR
• The pass 1 processes the Define Record.
• The algorithm for Pass 1 of Linking Loader is given
below.
Program Logic for Pass 2
• Pass 2 of linking loader perform the actual loading, relocation, and
linking.
• It uses modification record and lookup the symbol in ESTAB to obtain its
address.
• Finally it uses end record of a main program to obtain transfer address,
which is a starting address needed for the execution of the program.
• In Pass 2, as each Text record is read, the object code is moved to the
specified address (plus the current value of CSADDR).
• When a Modification record is encountered, the symbol whose value is
to be used for modification is looked up in ESTAB.
• This value is then added to or subtracted from the indicated location in
memory.
Dynamic Linking
• The scheme that postpones the linking functions until
execution. A subroutine is loaded and linked to the rest of
the program when it is first called – usually called dynamic
linking, dynamic loading or load on call.
• The advantages of dynamic linking are, it allow several
executing programs to share one copy of a subroutine or
library.
• In an object oriented system, dynamic linking makes it
possible for one object to be shared by several programs.
• The actual loading and linking can be accomplished using
operating system service request.
• Dynamic linking provides the ability to load
the routines only when (and if) they are
needed.
– For example, that a program contains subroutines
that correct or clearly diagnose error in the input
data during execution.
– If such error are rare, the correction and diagnostic
routines may not be used at all during most
execution of the program.
– However, if the program were completely linked
before execution, these subroutines need to be
loaded and linked every time.
• Dynamic linking avoids the necessity of loading the entire
library for each execution.
– A method in which routines that are to be dynamically loaded must
be called via an operating system (OS) service request.
– The program makes a load-and-call service request to OS. The
parameter argument (ERRHANDL) of this request is the symbolic
name of the routine to be loaded.
– OS examines its internal tables to determine whether or not the
routine is already loaded. If necessary, the routine is loaded form
the specified user or system libraries.
– Control Id then passed form OS to the routine being called.
– When the called subroutine completes its processing, OS then
returns control to the program that issued the request.
– If a subroutine is still in memory, a second call to it may not require
another load operation.

You might also like