You are on page 1of 88

System Programming

UNIT I
Basics of system
programming
Goals of This Class
Enforce your programming skill
Get you acquainted with programming
in ‘C’, Linux OS
Make you prepared for University Exam
& entrance exams
Goals of System software

User convenience: Provide convenient


methods of using computer system.

Efficient use: Ensure efficient use of


computer resources.

Non-interference: Prevent interference in


the activities of its users.
Introduction
Software
 Application software usually used by end-user
 It is concerned with the solution of some problem, using
the computer as a tool, instead of how computers
actually work.
 System software
 System software consists of a variety of programs that
support the operation of a computer (ex: text editor,
compiler, debugger)
 One characteristic in which most system software differ
from application software is machine dependency
 A system software programmer must know the target
machine structure(e.g. of assembler)
Introduction
What is the System Software?
 Programs that supports the operation of a computer.

Examples of the System Software


 Text editor, compiler, loader, linker, assembler, macro processor,

operating system.

What is the System Programming?


 Design and implementation of system software.

“By understanding of the system software, you will gain a deeper


understanding of how computers actually works”
System Software
and Machine Architecture
Machine dependent system software
 System programs are to support the operation and use
of the target computer.
 The difference between different machine
 Machine code
 Instruction formats
 Addressing mode
 Registers
Machine independent system software
 General design and logic is basically the same:
 Code optimization
 General design and logic of an assembler
System Software
The system software includes
 Assembler
 Linker

 Loader

 Macro processor

 Text editor

 Compiler

 Operating system

 Debugging system

 Source Code Control System

Need ?
Programming Languages
Machine language:
It is computer’s native language having a sequence of zeroes
and ones (binary). Different computers understand different
sequences. Thus, hard for humans to understand e.g.0101001...

Assembly language:
It uses mnemonics for machine language. In this each instruction
is minimal but still hard for humans to understand:
e.g. ADD AH, BL

High-level languages:
FORTRAN, Pascal, BASIC, C, C++, Java, etc.
Each instruction composed of many low-level instructions,
closer to English. It is easier to read and understand:
e.g. hypot = sqrt(opp*opp + adj * adj);
File Handling
FILE:
 It is defined in the header file stdio.h.
 The link between our program and
operating system is a structure called FILE.
 FILE structure contains information about
file being used , such as current size,
location in memory.
 FILE *fp; where fp is file pointer of type
FILE.
Format of fopen()
FILE *fp;
fp= fopen(“file_name”, “type”);
 File_name: character string that contains name of file to be

opened.
 Type: File Meaning
Type
r Open existing file for reading only
w Opens a new file for writing if exists overwritten
a Opens existing file appending
r+ Read, write, modify existing file
w+ Read, write new file, existing file contents
destroyed
a+ Reading, appending existing file

 NOTE: Returns NULL if file does not exists or if unable to


open the file
Closing file
fclose(fp);
 flcose closes the file to which the file
pointer fp points to. It also writes buffered
data in the file before close is performed.
Character input/output in files
getc(): is used to read characters from a file opened
in read mode by fopen.
 General Format:

getc(fp);
getc() function returns EOF when end of file
occurs or if it encounters an error.
putc():is used to write characters to a disk file that
can be opened using fopen in write mode
 General Format:

putc(c, fp); where c is character to be written


Direct Input/Output
fread(): reads into array ptr upto
nitems data items of size size from
the steram fp and returns the number
of items read. If error returns EOF.
int fread(ptr, size, nitems,fp)
Char *ptr
int size, nitems
FILE *fp
rchar=fread(buf, sizeof(int), 20,input)
fwrite :appends at the most nitems
item of data of size size in the file to
which file pointer fp points to, from the
array to which pointer ptr points.
int fwrite(ptr, size, nitems,fp)
Char *ptr
int size, nitems
FILE *fp
rchar=fwrite(buf, sizeof(char), 80,output)
Program

Copy the contents of text file in to


another text file character by character.
Write program to accept characters
from user and write in to file.
#include<stdio.h>
void main()
{
FILE *fs, *ft;
char c;
fs=fopen("pr1.txt","r");
if(fs==NULL)
{ printf("unable to open file");
exit(0);
}
ft=fopen("pr2.txt","w");
if(ft==NULL)
{
printf("unabl to create new file");
fclose(fs);
exit(0);
}
while(1)
{
c=getc(fs);
if(c==EOF)
break;
putc(c,ft);
}
fclose(fs);
fclose(ft);
getch();
}
Contents

• Language processors
• Data structures for language
processing
• Scanning and parsing
• Assembler
Language Processor
Necessity of Language of processing
Application domain
Execution domain
Semantic gap
Draw back of semantic gap
 Large development time
 Large development effort
 Poor quality of software
A. Language processors

A Language processor is a software


which bridges a specification or
execution gap
Specification Gap Execution Gap

Application Execution
PL Domain
Domain Domain
Language processors

A language translator
A detranslator
A preprocessor
Interpreter
Language Processing Activities

Program generation activities


Program execution activities
1.Program translation
2.Program interpretation
Program Generator
Program generator: It is s/w which accepts the
specification of a program to be generated and
generates a program in target PL
Program Generator Domain

EG between target PLD and ED is


bridged by compiler or interpreter for
the PL.

Specification Gap

Program
Application Target PL Execution
Generator
Domain Domain Domain
Domain
Program generator validates the data
during data entry.
It reduces the testing effort.
Economical than to develop problem
oriented language(large EG).
Employee Name

Address

Age
Program Execution Activities
Characteristics of program
translation model

Program must be translated before it


can be executed.
Translated program may be saved in
the file. The saved program may be
executed repeatedly.
A program may be retranslated
following modifications.
Program Interpretation

Interpreter Memory
PC Source
Program
+
Error Data
Characteristics of program
interpretation model

The source program is retained in the


source form itself i.e. no target program
exists.
A statement is analyzed during its
interpretation.

Qu: Define the term language processor and explain


various language processing activities (8)
comparison
asda asda
Fundamentals of Language
Processing

Language processing
= Analysis of source program
+ synthesis of target program

Lexical rules
Syntax rules
Semantic rules
Fundamentals of Language
Processing
Lexical Analysis:
• Identify lexical units->classification of units->
Tokens->class code and number
a:= b + I;
ID#2 OP#5 ID#3 OP#3 ID#1 OP#10
• ID#2 Stands for identifier occupying entry #2
in the symbol table
• Tokens: keywords, variable or arithmetic
operator
Continue…
Syntax analysis:
String of tokens->statement class->
build IC -> semantic analysis

Qu. Explain Syntax analysis, Lexical analysis and semantic


Analysis with example? (6-2012)

Qu. Why Lexical and syntax analyzer are separated out.


(2-2012)
Phases of a language processor

• Forward references
• Issues concerning memory requirements and
organization of a language processor`

Explain different phases of language processing?


(4-2011)
Forward Reference
A FR of a program entity is a reference
to the entity which precedes its
definition in the program
E. g. PR_PROFIT := (PROFIT * 100)/ COST_PRICE;
-------
MOVER AREG, PROFIT
Long profit;
MULT AREG,100
DIV AREG, COST_PRICE
MOVM AREG, PERCENT_PROFIT
---------------
PERCENT_PROFIT DW 1
COST_PRICE DW 1
PROFIT DW 1
Passes of a Language Processor

Pass I
 Performs analysis of the source program
and note relevant information
Pass II
 Perform synthesis of target program
Intermediate Representation
Desirable Properties of IR
Memory efficiency: IR must be
compact.
Processing efficiency: Efficient
algorithms must exist for constructing
and analyzing the IR.
Ease of use: IR should be easy to
construct and use
Language Processor
Development Tools

What is parsing? Explain LEX and YACC in brief.


Language Processor
Development Tools

Qu. What is Parsing? Explain LEX and YACC in brief.


(8-2011)
Assembly Language
Low level programming language
 Mnemonic operation code
 Symbolic operands
 Data declarations.
Assembly language is machine
dependent.
Assembler

A tool called an assembler translates assembly language


into binary instructions.
Basic Assembler Functions
Translate mnemonic opcode to machine language.
Convert symbolic operands to their
machine(neumeric) values.
Build the machine instructions in proper format.
Convert data constants in to machine representation.
Error checking if provided.
Variables are represented by symbolic names.
Changes can be quickly and easily incorporated with
reassembly.
Assembler maintains several data structures.
Assembly language statements
Imperative statements
 Indicate an action to be performed during the execution.
Declarative statements
 [label] DS <constant>
 [label] DC <value>
 A DS 1
 B DS 200
 ONE DC ‘1’
Assembler directives
 START <constant>
 END [<operand spec >]
Literals
Write the value of a constant operand as a part of
the instruction that uses it, such an operand is called
a literal

Avoid having to define the constant elsewhere in the


program and make up a label for it

A literal is identified with the prefix =, which is


followed by a specification of the literal value

With a literal, the assembler generates the specified


value as a constant at some other memory location.

The address of this generated constant is used as the


target address for the machine instruction
Assembler Directive
Assembler directive instruct the
assembler to perform certain actions
during the assembly of program.
E.g. START <constant>

How literals are processed by assembler which does not


support immediate operands.(4-2012).

What is literal? What is its advantage in assembly


language programming?
Advantages of Assembly
Language

Reduced errors
Faster Translation time
Changes could made easier and Faster
Better than HLL where it is desirable to
use specific architectural features.
Disadvantages
Many instructions are required to
achieve small task.
Programmer requires knowledge of the
processor architecture and instruction
set.
Programs are machine dependent,
requires complete rewrite if hardware is
changed.
Design specification of assembler

Specify the problem.


Define format of data structure.
Specify algorithm.
Determine the processing necessary to
perform the task
Analysis Phase
Isolate the label, mnemonic opcode and
operand fields of statement.

If label is present enter the pair in a new


entry of symbol table.

Check validity of the mnemonic opcode


through a lookup in the Mnemonics table.

Perform the LC processing i.e. update the


value contained in LC by considering the
opcode and operands of the statement.
Synthesis Phase
Obtain the machine opcode
corresponding to the mnemonic from
mnemonic table.

Obtain the address of a memory


operands from the symbol table

Synthesize the machine instruction


Data structures of the assembler
Single Pass Translation
Forward reference.
Back patching.
e.g.MOV AREG, A
TII(instruction add and symbol)
Two pass Translation
Handles forward references.
Design of two pass Assembler
Pass - I
Separate the symbol, mnemonic opcode and
operand fields.
Build the symbol table.
Perform LC processing.
Construct intermediate representation

Pass – II
Synthesize the target program
Pass – I Databases
Input source program
LC: used to keep track of each instructions
location.
MOT: mnemonic and its length.
ST: stores each symbol and its
corresponding location.
LT: stores each literal and its
corresponding assigned location
Data structures of assembler pass I
An assembly program
Algorithm for pass-I of two
pass assembler
Read
If label
 Yes: store label in ST with LC value
 No: Search pseudo OPTAB
Search pseudo OPTAB
 Found: type
 DS, DC : Determine length of data space required and update LC
 End: Go to pass-II
 Not found: Search m/c OPTAB
Search m/c OPTAB(MOVEM AREG,A)
 Get length of instruction
 Process literals update LC
 Read
Assembler First Pass
1. loc_cntr:=0;
POOLTAB[1]:=1;
pooltab_ptr:=1; littab_ptr:=1;
2. While next statement is not END
a) If label is present
this_label = symbol in label field;
Enter(this_label,loc_cntr) in SYMTAB
b) If an LTORG statement then
process literal and update location counter
c) If a START or ORIGIN statement
loc_cntr:= value in operand field
d) If an EQU statement then
i) this_addr:=value of <address spec>
ii) correct the symtab entry for this_label to
(this_label,this_addr)
e) If a declaration statement the
i)code:=codeof the declaration statement
ii)size:=size of memory area req. by DC/DS
iii)loc_cntr:=lcn_cntr+size
iv) Generate IC ’(DC,code)..’
f) If an imperative statement then
i) code:=machine opcode from OPTAB
ii)loc_cntr:=lcn_cntr+instruction length from OPTAB
iii)IF operand is literal then
this_literal:=literal in operand field;
LITTAB[littab_ptr]:=this_literal;
littab_ptr:=littab_ptr+1
else(operand is symbol)
this_entry:=SYMTAB entry number of operand
Generate IC(IS, code)(S, this_entry)
3. (Processing of END statement)
a) Perform step 2b
b) Generate IC ‘(AD,02)’
c) Go to Pass II
Intermediate code
IC is categorized based on:
 Processing efficiency
 Memory economy
Each IC unit consists of following 3
fields:
 Address
 Representation of mnemonic opcode
 Representation of operands
Address Mnemonic Operands
Opcoed
Code for declaration statements and
directives
Intermediate code – variant I
Intermediate code – variant II
Memory requirements using variant I
and variant II
Processing of declarations
• All of the literal operands used in the program
are gathered together into one or more literal
pools
• Normally literals are placed into a pool at the
end of the program
• A LTORG statement creates a literal pool that
contains all of the literal operands used since
the previous LTORG
• Most assembler recognize duplicate literals: the
same literal used in more than one place and
store only one copy of the specified data value
LITTAB (literal table): contains the literal name,
the operand value and length, and the address
assigned to the operand when it is placed in a
literal pool
An assembly program
Two pass assembler
Read statement
Search pseudo opcode table
 Found : check type
 DC,DS: Determine length of data space update LC
 END : clean up and exit
 Not found: Search MOT
Search MOT
 Get instruction length, format type and binary code
 Evaluate operand expression by searching for values of
symbols
 Assemble together the parts of instruction and update LC
Read statement.
Data structures for two pass
assembler
Input: pass I program(IC)
Location counter(LC)
Machine operation table (MOT):
Pseudo-Operation table(POT)
Symbol table(Prepared by Pass I)
Base Table (indicates which registers are
currently specified as base register)
Work space INST : used to hold each
instruction as its various parts(binary op-code,
register field, length field, displacement field)
are being assembled together.
Algorithm: Second pass
Continued
Error reporting in pass - I
Error reporting in pass - II

• Two pass assembler performs program listing and


error reporting in pass II.
• Errors detected in pass I are stored in an error table.
Thank you !!!
Symbol-Defining Statements
Assembler directive that allows the programmer to define symbols
and specify their values
General form: symbol EQU value
Line 133: +LDT #4096
MAXLEN EQU 4096
+LDT #MAXLEN
It is much easier to find and change the value of MAXLEN
Assembler directive that indirect assigns values to symbols ORG

STAB RESB 1100


ORG STAB
SYMBOL RESB 6
VALUE RESW 1
FLAGS RESW 2
ORG
STAB+1100
Context-Free Grammars
Grammars
Grammars express languages
Example: the English language
sentence  noun _ phrase predicate

noun _ phrase  article noun

predicate  verb
article  a
article  the

noun  cat
noun  dog

verb  runs
verb  walks
80
A derivation of “the dog walks”:
sentence  noun _ phrase predicate
 noun _ phrase verb
 article noun verb
 the noun verb
 the dog verb
 the dog walks

81
More Notation
G  V , T , S , P 
Grammar
V: Set of variables

T: Set of terminal symbols

S: Start variable

P: Set of Production rules


82
Scanning and parsing
Scanning is the process of recognizing
the lexical components in a source
string
Parsing is the process of checking the
validity of a source string, and to
determine its syntactic structure
 Top down parsing
 Bottom up parsing

83
Errors Reporting in Pass-I

Source need not be preserved till pass II.

Conserves memory and avoids duplicate


processing.

Debugging difficult.(mapping of source


statement and target code is difficult )
Enlist the different types of errors that are handled by
Pass- I and pass- II of two pass assembler(6-2012)
Data structures for language
processing
In computer science, a data structure is a particular way of
storing and organizing data in a computer so that it can be used
efficiently.

Different kinds of data structures are suited to different kinds


of applications, and some are highly specialized to specific
tasks. For example, B-trees are particularly well-suited for
implementation of databases, while compiler implementations
usually use hash tables to look up identifiers.

1. Search data structure


2. Allocation data structures
85
Search data structure
Sequential search organization
Binary search organization
Hash table organization
Linked list and tree structured
organization
Nested search structures

86
Allocation data structures
Stacks
 A linear data structure which permits
allocation and deallocation of entities in a
Last-in-first-out (LIFO) manner and only
the last entry is accessible at any time
Heaps
 A nonlinear data structure which permits
allocation and deallocation of entities in a
random order

87
Symbol-Defining Statements
Assembler directive that allows the programmer to define symbols
and specify their values
General form: symbol EQU value
Line 133: +LDT #4096
MAXLEN EQU 4096
+LDT #MAXLEN
It is much easier to find and change the value of MAXLEN
Assembler directive that indirect assigns values to symbols ORG

STAB RESB 1100 STAB RESB 1100


SYMBOL EQU STAB ORG STAB
VALUE EQU STAB+6 SYMBOL RESB 6
FLAGS EQU STAB+9 VALUE RESW 1
FLAGS RESW 2
ORG STAB+1111

You might also like