You are on page 1of 19

Chapter-6

Compiler
Compiler is system software components that accepts a program return in a high level language
and produce an object program

The compiler must perform the following 4 tasks [functions]:


1. Recognize certain strings as basic elements or token i. e., variables, operator‟s
keywords etc.
2. Recognize combinations of elements as synthetic units and interpret their meaning.
3. Allocates strong and assign location for all variables in the program
4. Generate the appropriate object code

General model of a compiler:


Ex:
WCM: procedure (Rate, Start, finish);
Declare (Cost, Rate, Start, Finish) fixed binary (31) static;
Cost=Rate *(Start- Finish) +2*Rate*(Start-Finish-100);
Return (Cost);
End;
1 Recognize basic elements are tokens:
Step1:

“The source program was brocken to pieces of blocks called as tokens”.


• In the representation taken in a represented by a rectangular symbols

• Tokens are recognized as identifiers, literals (Constants), terminals symbols (operators or


keywords).

• In the above Example WCM, Rate, Start, Finish are identifies


• Procedure is keyword.
• : ( , ) ; are terminal symbols.

Step 2:
The basic elements are tokens are entered into the table.
The table consists of 2 fields
1. Uniform symbols
2. Pointer

The uniform symbols are of fixed size and points the table entry of the associated basic element.
Here, uniform symbols are IDN for identifiers TRM->for terminals, LIT->for literals

2 Recognizing syntactic units and interpreting their meaning:


Here, we have to perform 2 separate tasks
1. Recognize the phrases
2. Interpreting their meaning
Step1:
Recognize the phrases (statements or syntactical construction):
The compiler checks for the validity of each phrase are statements.
If the statements are free of errors then, the statement is declared as a valid statement.
Else,
The compiler assures some sort of recovery and continuous with the complication errors of the
next statement.

Step 2:
Interpreting the meaning of the construction:

After performing the above step the resultant form is “syntactic form”

Step 3:
Intermediate form:
“The process of generating the object code for each construction after determining syntactic
construction is known as intermediate form”

The intermediate form depends on syntactic construction. They are:


1. Arithmetic statement
2. Non-Arithmetic statement
3. Non executable statement

1 Arithmetic Statements: The one intermediate form of the arithmetic statement is a parse tree.
The rules for convening arithmetic statement into a phrase tree are:
a) Any variable is a terminal node of a tree
b) For every operator having 2 branches in a binary tree whose left branch in the tree for
operand and whose right branch in the tree for operand 2.

Priority to constraint the tree:


1. Highest priory is given to the expression written in brackets
2. * and 1 operator having the second priority
3. + and – operator having the third priority
4. If the sequence of the operator is same then start solving from left to right

Ex:

The another intermediate form is linear representation of the parse tree called a matrix
Matrix number Operator Operand1 Operand2
1 - Start Finish
2 * Rate M1
3 * 2 Rate
4 - Start Finish
5 - M4 100
6 * M3 M5
7 + M2 M6
8 = Cost M7

2 Non-arithmetic statement: The non-arithmetic statements are DD, IF, GOTO are the
examples of non-arithmetic statements
These statements can all be replaced by a sequential ordering of individual matrix entry.
Ex: Return (cost)
End

Matrix
Operator Operand1 Operand2
Return Cost
End

3 Non-Executable statements: Non-Executable statements such as declare give the compiler


information that clarifies the reference a allocation of variables and associated storage.
The information contains in a non-executable statement is entered into tables

Ex:
Declare (Cost, Rate, Start, Finish) fixedBinary (31) static;
The tables consist of four fields
1. Variables-> cost, rate, start, finish
2. Data type-> fixed binary
3. Precession-> 31 bits
4. Storage class-> static

3) Storage allocation:

Proper amount of memory is reserved i.e., required by the program at some point of time.

Ex:
Declare (cost, Rate, Start, Finish) Fixed Binary

( 3 1 ) Static ;

Identifiers table for above example:


Name Base Scale Prerecession Storage relative
class
Cost Binary Fixed 31 Static 0
Rate Binary Fixed 31 Static 4
Start Binary Fixed 31 Static 8
finish Binary Fixed 31 static 12

Identifiers table consists of size fields:


1. Name: it specifies the name of the variable
2. Base: binary or decimal
3. Scale: Fixed or Float
4. Precisions: number of digits and used floating point number, a scale factor

Storage classes are:


1. Static
2. Automatic
3. Controlled
4. Base
The storage allocation routine scans the identifier table and assigns location to each scalar
Since the data type of each variable is of fixed (32) bits, the relative location 0 is assigned to the
first variable, 4 for second variable, 8 for third variable and 12 for fourth variable

Each variable of size 32 bite the first bit is reserved for representing sign bit. The sign bit is
allocated during load time

Sign Data [Binary or decimal]

This relative addresses are used by the later phases of the compiler for proper accessing similarly
storage is also assigned for the temporary locations that will contain intermediate results of the
matrix
Ex:[ M1, M2, M3,………M7]
4 Code generations: The code generation phase taking the input in matrix form and generating
the object code for each and every entry defined in the table
Each entry in the matrix and with the associated object code is defined by a table called as
production on table
Ex:
Start – Finish
The operator -> In matrix is treated as a macro call
The operands start and finish -> Is treated as macro arguments
Operator operand1 operand2
L 1,&operand1
S 1,&operand2
ST 1,&N
The following code can be generated the above statement using code definition of the
operator minus.
L 1,start
S 1,finish
ST 1,M1

Optimization [machining dependent]:-


Removing or deleting the duplicate entries in the matrix and modifying aii reference to the
deleted entries.
Matrix with common sub expressions Matrix after elimination of common sub
expressions
M1 – Start Finish M1 – Start Finish
M2 * Rate M1 M2 * Rate M1
M3 * 2 Rate M3 * 2 Rate
M4 – Start Finish M4
M5 – M4 100 M5 – M1 100
M6 * M3 M5 M6 * M3 M5
M7 + M2 M6 M7 + M2 M6
M8 = Cost M7 M8 = cost M7

Optimization [machine dependent]:-


This phase has reduced both the memory space and the execution time of the object program.
Since these two factors is dependent on machine. The type of optimization is known as machine
dependent optimization.

Assembly phase:
The code generating phase is producing assembly language or the process of generating the
actual code is known as assembly phase
The assembly phase must perform these operations:
1. Resolve label references
2. Calculate addresses
3. Generate binary machine instructions
4. Generate storage
5. Convert literals

General model of complier:


There are 7 distinct logical problems
1. Lexical analysis
2. Syntax analysis
3. Integration phase
4. Machine independent optimization
5. Storage assimenent
6. Code generation
7. Assembly and output

1 lexical analysis: Recognition of basics element or tokens and creation of uniform single table

2 Syntax analyses: Recognition of basics syntactic construct through reduction table

3 Interpitaton phases: It describes the definition of exact meaning, creation of matrix and tables
for respective routine [action routings]

4 Machine independent optimization: Creation of most optimal matrix [removes the duplicate
entries in the matrix table]

5 storage assignment: It makes entries in the matrix that allow code generation to create code
that allocates dynamic storage and also the assembly phase to reserve the proper amount of
STARTIC storage

6 Code generation: A macro processor is used to produced more optimal assembly code

7 Assembly and Output: It resolving symbolic address and generating the machine language
Phase 1 to 4 is machine independent and language3 dependent. Because this phases helps in
determining the syntax and meaning of each statement in the source program. Hence it
dependent on the language and independent of the machine
Phase 5 to 7 is machine dependent and language independent. Because this phase allocates
memory for literals and also generate the assembly code which is dependent on machine and
independent of language

The database used by the compiler is:


1 Source code: The program written by user or the user program.

2 Uniform symbol table: It consist of the tokens or basic elements as they appear in the
program created by lexical analysis phase and given as input syntax analysis and interprition
phase

3 Terminal table: This table is created by lexical analysis phase and contains all variable in the
program
4 Identifier table: It contains all variable in the program and temporary storage [Ex M1, M2,
M3 … M7] and information needed to reference allocate storage for the variables. This table is
created by lexical analysis

5 Literal tables: It contains all contents in the program

6 Reductions: It is a permanent table of decision rules in the form of pattern for matching with
the uniform symbols table to discover synthetic structure.
7 Matrix: Matrix is created by the intermediate form of the program which is created by the
action routine. It is optimized and then used for code generation

8 Code productions: It is permanent table of definition. There is one entry defining code for
each matrix operator.

9 assembly code: The assembly language variation of the program which is created by the code
generation phase and it is input to the assembly phase

10 Re-locatable object codes: The final output of the assembly phase ready to be use as input to
loader

Phases of compiler

1 Lexical phase:
The lexical phase performs the following three tasks:
1. Recognize basic elements are tokens present in the source code
2. Build literal and an identifier table
3. Build a uniform symbol table
Database:
Lexical phase involves the manipulation of 5 databases
1. Source program
2. Terminal table
3. Literal table
4. Identifier table
5. Uniform symbol table

1 Source program: The original form of the program created by the user
2 Terminal table: It is a permanent database it consist of 3 fields

Symbol Indicator precedence



• Symbol: operators, keywords and separators [(,;,:]
• Indicators: values are YES or NO
Yes=> operators, separators
No=> Keywords
• Precedence: Used in later phase

Step Symbol Indicator Precedence


1 : Yes
2 ; Yes
3 ( Yes
4 ) Yes
5 , Yes
6 * Yes
7 Declare No
8 Procedure No
9 + Yes
10 - Yes
11 * Yes
12 Rate No
13 Start No
14 finish No

3 Literal table:
It describes all literals constants used in the source program.
It consists of 6 fields:
Literal Base Scale Precision Other address
information

Other information and address are stored in lateral phases


Ex:
Literals Base Scale Precision Other Address
information
31 Decimal Fixed 2
2 Decimal Fixed 1
100 decimal fixed 3

4 Identifier table:
It describes all identifiers used in the source program. It consists of three fields
1. Name
2. Data attribute
3. Address

Name Data attribute Address

Data attribute and address are used in later phases


Name Data attribute address
WCM
RATE
START
FINISH
COST

5 Uniform symbol tables:


Uniform symbol table represent the program as a strange of tokens rather than individual
character. There is one uniform symbol for every token in the program
It consists of 2 fields:
Table class index

Table class Index token


IDN 1 WCM
TRM 1 :
TRM 8 Procedure
TRM 3 (
IDN 2 Rate
TRM 5 ,
IDN 3 Start
TRM 5 ,
IDN 4 Finish
TRM 4 )
TRM 2 ;

Algorithm:
Step1: The first task of the lexical analysis algorithm is to parse the input character strange into
tokens
Step2: the second step is to make appropriate entries in the table.
Implementation:
1 The input strange is separated into tokens by break character. Brake characters are denoted by
the contents of a special field in the terminal table
2 lexical analysis 3 types of tokens:
1. Terminal symbols [TRM]
2. Identifiers [IDN]
3. Literals [LIT]
If symbol== TERMINAL table then
Create uniform symbol table of type TRM
3 Else if symbol==IDENTIFIER table then
Create uniform symbol table of type IDN
4 Else
Create uniform symbol table of type LIT
End if

2 Syntax Phase:
The functions of the syntax phase are
1. To recognize the major construct of the language
2. To call the appropriate action routines that will generate the intermediate form or matrix
form the constructs

Databases:
1 Uniform symbol table: The table create a by lexical phase
The uniform symbols are the source of input to the stack which s used by syntax and
interpretation phase
Table classes index

2 Stack: The stack is a collection of uniform symbol i.e., currently being worked on the stack is
organized in LIFO technique

3 Reduction table: The syntax rules of the source language are contained in the reduction table
The general form of the reduction or rules is:-
Label: old top stack/ action routine/ new top stack/ next reduction

Algorithm:
Step1: Reduction or tested consequently for match between old top of stack field and the actual
top of stack until match is found
Step2: When match is found the action routine specified in the action fields are executed in
ordered from left to right
Step3: when controlled return to the syntax analyzer, it modifies the top of stack to agree with
the new top of tack.
Step4: step1 is repeated starting with the reduction specified in the next reduction field

3. Interpretation Phase:
1. Uniform symbol table
2. Stack
3. Identifier table
4. Matrix
The above mentioned data bases are referred in text book page nos: 210.

5. Optimization Phase:
Optimization performed by a compiler are of 2 types. They are

1. Machine dependent Optimization:


It is related to the machine instructions that get generated. So it is added into the
code generation phase.
2. Machine independent Optimization:
It is not related to the machine instructions. It is used to increase efficiency of the
code and reduces the lines of code.
Data bases:
• Matrix
• Identifier table.
• Literal table.
These are referred in text book page no:217.
Machine in dependent code Optimization:
Ex: A=2 * 276 / 92 * B
Refer in text book page no: 219.
Machine dependent code Optimization:
Ex: A= B + C + D
Refer in text book pageno: 224.

6. Code generation:
The Purpose of the code generation is to produce appropriate code. In this phase Matrix is
the input data base.
Data bases:
• Matrix
• Identifier table
• Literal table
• Code productions.
Ex: code generation with machine dependent Optimization.
A=B+C+D
Refer in text book Page no: 224.

Draw an Overview of a flowchart of a compiler depicting the passes. (or)


Explain the Passes of a Compiler.
Passes of a compiler
The above diagram depicts a flowchart of a compiler.
Pass1:
It corresponds to the lexical analysis of a compiler. It scans the source program and
creates the identifiers, literals and uniform symbol tables.
Pass2:
It corresponds to syntax and interpretation phases. Pass2 scans the uniform symbol table
produces the matrix.
Pass3 through Pass N-3 means Pass4:
They corresponds to the optimization phase.
Pass N-2: Pass 5:
It corresponds to the storage assignment phase.
Pass N-1: Pass 6:
It corresponds to code generation phase. It scans the matrix.
Pass N: Pass 7:
It corresponds to Assembly and output phase.

What is Cross Compiler?


Def:
A cross compiler is a compiler capable of creating executable code for a platform other
than the one on which the compiler is running.
A cross compiler is necessary to compile for multiple platforms from one machine. A platform
could be infeasible for a compiler to run on, such as for the microcontroller of an embedded
system because those systems contain no operating system.
Cross compilers are not to be confused with source-to-source compilers. A cross compiler is for
cross-platform software development of binary code, while a source-to-source "compiler" just
translates from one programming language to another in text code.
Uses of cross compilers
• Embedded computers
• Compiling for multiple machines
• Use of virtual machines

What is Linker and Functions of Linker?

• The linker is the software program which binds many object modules to make a
single object program
• Functions of Linking are Static linking and dynamic linking.

What is Dead code?


In computer programming, dead code is a section in the source code of a program which is
executed but whose result is never used in any other computation. The execution of dead code
wastes computation time and memory.

Explain CRT [ Cross Reference Table ].


The cross reference table is a data structure that replaces a run-time lookup computation with a
much simpler lookup operation.
The gain in processing speed can be significant because retrieving a value from memory is much
faster than performing a database or other connector lookup.
Cross reference table lookups are most often performed in map functions,
but can also be used as parameter values in all process steps that use parameters, such as
connectors (including the Start shape), Decision, Set Properties, Message, Program Command,
and Exception shapes.
Some common uses of a cross reference table are:
• A simple value translation between System A and System B, such as item codes, units of
measure, status codes, or any other type of code
• Reusable translations (for example, U.S. state abbreviations)
• Switch/case logic (simple if/else)
• Atom-specific map default values
• Atom-specific connection default values (Start shape criteria)
• To parameterize any process step with Atom-specific values for deployment to multiple
locations or customers
The cross reference table is easy to use and requires no coding. It is comprised of a set of data
elements (or values) that are organized using a model of rows and columns.

Cross Reference Table Example: One Input and One Output

A cross reference table can be used to accept one input value and produce one output value. The
example below shows a cross reference table lookup within a function. In addition to setting up
the function, you need to map all input elements from the source profile to the input values in the
function. You also need to map all output values from the function to the output elements in the
destination profile.
For example, when referring to the U.S. states:
• System A uses the State Name value
• System B uses the FIPS Alpha Code value
When mapping from System A to System B, you need to translate the State Name value to the
FIPS Alpha Code value. The SQL Select statement for the Output Element would be: SELECT
FIPS_Alpha_Code FROM State_Cross-Reference_Example WHERE State Name =
Input_Element. If the State Name=Alabama in System A then the FIPS Alpha Code=AL for
System B. "AL" is the value that will be returned in the output.

You might also like