You are on page 1of 179

Language Translator

 A translator or programming language processor is a generic term that could


refer to a compiler, assembler, or interpreter;
 anything that converts code from one language into another.
 These include translations between
 high-level and human-readable computer languages such as C++ and Java, intermediate-
level languages such as Java bytecode,
 low-level languages such as the assembly language and machine code, and
 between similar levels of language on different computing platforms, as well as from any of
these to any other of these.
• Types of Language Translators

• Compiler
• Interpreter
• Assembler
Language processing systems (using Compiler) –Cousins of Compilers
 High Level Language – If a program contains #define or
#include directives such as #include or #define it is called
HLL. They are closer to humans but far from machines.
These (#) tags are called pre-processor directives. They
direct the pre-processor about what to do.
 Pre-Processor – removes all the #include directives by
including the files called file inclusion and all the #define
directives using macro expansion. It deals with macro-
processing, augmentation, file inclusion, language
extension, etc.

 Assembler – For every platform (Hardware + OS) we will


have a assembler. They are not universal since for each
platform we have one. The output of assembler is called
object file. Its translates assembly language to machine
code.
Language processing systems (using Compiler) –Cousins of Compilers

Interpreter – An interpreter converts high level


language into low level machine language, just
like a compiler. But they are different in the way
they read the input.
The Compiler in one go reads the inputs, does
the processing and executes the source code
whereas the interpreter does the same line by
line. Compiler scans the entire program and
translates it as a whole into machine code
whereas an interpreter translates the program
one statement at a time. Interpreted programs
are usually slower with respect to compiled
ones
Language processing systems (using Compiler) –Cousins of Compilers
Relocatable Machine Code - It can be loaded at any point and can be run. The
address within the program will be in such a way that it will cooperate for the
program movement.

Loader/Linker – It converts the relocatable code into absolute code and tries to run
the program resulting in a running program or an error message (or sometimes both
can happen).

–Linker loads a variety of object files into a single file to make it executable. Then
loader loads it in memory and executes it. Linker is a computer program that links
and merges various object files together in order to make an executable file. All
these files might have been compiled by separate assemblers. The major task of a
linker is to search and locate referenced module/routines in a program and to
determine the memory location where these codes will be loaded, making the
program instruction to have absolute references Loader is a part of operating system
and is responsible for loading executable files into memory and execute them. It
calculates the size of a program (instructions and data) and creates memory space
for it. It initializes various registers to initiate execution.
COMPILER ASSEMBLER
Compiler converts the source code written by the Assembler converts the assembly code into the machine
programmer to a machine level language. code.
Compiler input source code. Assembler input assembly language code.

It converts the whole code into machine language at a But the Assembler can’t do this at once.
time.

A Compiler is more intelligent than an Assembler. But, an Assembler is less intelligent than a Compiler.

The compilation phases are lexical analyzer, syntax Assembler makes two phases over the given input, first
analyzer, semantic analyzer, intermediate code phase and the second phase.
generated, a code optimizer, code generator, and error
handler
The output of compiler is a mnemonic version of machine The output of assembler is binary code.
code.
C, C++, Java, and C# are examples of compiled languages. GNU is an example of an assembler.
Introduction
• A compiler is a program that reads a program written in one language and translate it into an
equivalent program in a another language
• is a software which converts a program written in high level language (Source Language) to low
level language (Object/Target/Machine Language).
• Compiler also reports errors presented in the source program as a part of its translation
• There is a agreement on format for object (or assembly) code

High Level Language Compiler Low Level Language


(Source Program) (target Program)

Compilation Error

Compiler also reports errors present in the source program as part of its translation process
Introduction
• Cross Compiler :
• that runs on a machine ‘A’ and produces a code for another machine ‘B’.
• It is capable of creating code for a platform other than the one on which the compiler is
running.
• Source-to-source Compiler or transcompiler or transpiler:
• is a compiler that translates source code written in one programming language into source
code of another programming language.
Single pass Compiler
 Process the input exactly once
 Command interpreters such as bash/ sh/ tesh can be considered as
single pass compiler
 All phases (Lexical Analysis, Syntax analysis, Semantic analysis ,
Intermediate code generation, Code optimization, Target Code
generation) are in a single module
 Faster and smaller than multi-pass compiler
 Less efficient in comparison with multi-pass compiler
 Can not optimize very well due to the text
Two pass Compiler
 First pass
o is refered as Front end
o It is a Analytical Part of compiler
o Produces Platform independent code
o The output of first pass is three address code
o Lexical Analysis, Syntax Analysis, Semantic Analysis, Intermediate code
generation – First pass phases
 Second pass:
o Back end of the compiler
o Synthesis part – taking input as three address code and convert them into
low level language /assembly language
o Platform dependent – dependent on target machine system
Two pass Compiler
 With the multi-pass compiler, we can solve these two basic problems
 If we want to design as compiler for different programming language for same machine
Two pass Compiler
 With the multi-pass compiler, we can solve these two basic problems
 If we want to design a compiler for same programming language for different machine /system
Type Single Pass Compiler Multi pass Compiler
Speed Fast slow
Memory Required More less
Time Less More
Portability No Yes
Major Parts of Compilers( Two Parts of the compilation Process) : Analysis and Synthesis
Analysis Part:
 input: source program Front-end, Back-end division
 Output: intermediate code /representation
Synthesis Part:
 Input : intermediate code
 Output: final target machine program

 Front end maps legal code into IR, Back end maps IR onto
target machine
 Simplify retargeting, Allows multiple front ends
 Multiple passes -> better code
Structure/ Steps/ Components / Phases of / Architecture of a Compiler

– Each phase transforms the source


program from one representation
into another representation.
– They communicate with error
handlers.
– They communicate with the
symbol table.
Steps of a Compiler

Source Lexical Syntax Semantic Intermedi Code Target Target


Program Analyzer Analyzer Analyzer ate Code Optimizer Code Program
Generator Generator
Input HLL Streams of Tokens Parse Tree SDT Three Optimized Assembly
Characters Address Code Language(
Code Low Level
Code)
Output Tokens Parse Tree SDT Three Optimized Machine
Address Code Code
Code
Structure/ Steps/ Components / Phases of / Architecture of a Compiler

Analysis Phase – An intermediate representation is created from the


give source code :
• Lexical Analyzer
• Syntax Analyzer
• Semantic Analyzer
• Intermediate Code Generator

Lexical analyzer : divides the program into “tokens”,


Syntax analyzer : recognizes “sentences” in the program using syntax
of language and
Semantic analyzer : checks static semantics of each construct.
Intermediate Code Generator : generates “abstract” code.
Structure/ Steps/ Components / Phases of / Architecture of a Compiler

Synthesis Phase – Equivalent target program is created from the


intermediate representation. It has two parts :
• Code Optimizer
• Code Generator

Code Optimizer optimizes the abstract code, and


final Code Generator translates abstract intermediate code into
specific machine instructions.
Structure/ Steps/ Components / Phases of / Architecture of a Compiler

Error Recovery
• A parser should be able to detect and report any error in the program.
• It is expected that when an error is encountered, the parser should be able to
handle it and carry on parsing the rest of the input.
• Mostly it is expected from the parser to check for errors but errors may be
encountered at various stages of the compilation process.
• A program may have the following kinds of errors at various stages:
• Lexical : name of some identifier typed incorrectly
• Syntactical : missing semicolon or unbalanced parenthesis
• Semantical : incompatible value assignment
• Logical : code not reachable, infinite loop
Symbol Table:
• Data structure created and maintained by compiler in order to
maintain the occurrence of various entities such as:
• Variables and function name
• Objects
• Classes
• Interfaces
• Information is gathered form the analysis phase and used in synthesis
phase
Lexical Analyzer( Scanner, Tokenizer)

• Lexical Analyzer reads/scans the source program character by


character and returns the stream of tokens of the source program
• Maps characters into tokens ( the basic unit of syntax)
• Eliminate white space (tabs, blanks, comments)
• Removes comments from source programs
• If the lexical analyzer finds a token invalid, it generates an error.
Show up the error with on which line it present (if any)
• Handling of pre- processor directives #define , #include , macro
expansion , file inclusion
Lexical Analyzer
A token
• describes a pattern of characters having same meaning in the source program.
(such as identifiers, operators, keywords, numbers, delimeters and so on)
• They are terminals symbols of the source program
Lexemes
• are said to be a sequence of characters (alphanumeric) in a token.
• These are specific instances of tokens.
• There are some predefined rules for every lexeme to be identified as a valid
token.
• Lexemes are matched against the pattern
Pattern
• These rules are defined by grammar rules, by means of a pattern.
• Patterns is a rule describing all those lexeme that can represent a particular
token in a source language
• A pattern explains what can be a token, and these patterns are defined by means
of regular expressions
Lexeme Vs Token
Lexeme Token
A Lexeme is sequence of characters from Tokens are symbolic names for the entities
the input that match a pattern that make up the text of the program.
Example: ID, Constant, Keywords,
Operators, Punctuation, Literals string
Example: Lets have a statement: Int X=5;
Lexeme are: Tokens are:
• Int • Keywords
• X • ID
• = • Op
• 5 • Constant
• ; • Op
Lexical Analyzer

e.g new val := old val +12


Tokens: new val , old val(identifier) , +, =(operator) , 12 (number )

Consider a statement
Count = count + temp ;
id=id+id;
Tokens : id, operator , punctuation
Lexemes: count, temp, =,+,;
e.g 31+28-59 For example, in C language, the
here variable declaration line
tokens : Number : [0-9]*, operator int value = 100;
lexeme : 31, 28, 59 (number) contains the tokens:
int (keyword), value (identifier), =
+,- (operator )
(operator), 100 (constant) and ; (symbol).

(3)
(1) (2)

Q. How many tokens are in program 1 : 26 or 32 ??

Q. How many tokens are in program 2 : 25 or 30

Q. How many tokens are in program 3: 33 or ????


Lexical Analyzer

• Regular expressions are used to describe tokens (lexical


constructs).
• A (Deterministic) Finite State Automaton can be used in the
implementation of a lexical analyzer.
• A key issue is speed so instead of using a tool like LEX it
sometimes needed to write your own scanner
• The lexical analyzer works closely with the syntax analyzer. It
reads character streams from the source code, checks for
legal tokens, and passes the data to the syntax analyzer
when it demands.
Lexical Analyzer
Lexical Error:
A lexical error analyser may not proceed if no rule /pattern for the tokens matches the prefix of the
remaining input.
Solution:
• Deleting a extraneous characters
• Inserting a missing character
• Transposing two adjacent characters
• Replacement of particular character by another correct character
• Deletion of successive character from the input
Syntax Analyzer
• A Syntax Analyzer creates the  A lexical analyzer can identify tokens
syntactic structure (generally a with the help of regular expressions
parse tree) of the given program and pattern rules.
• Syntax Analyzer puts information  But a lexical analyzer cannot check
about identifiers into the symbol the syntax of a given sentence due to
table the limitations of the regular
• Syntax analysis or parsing is the expressions.
second phase of a compiler  Regular expressions cannot check
• A syntax analyzer is also called as balancing tokens, such as
a parser parenthesis.
• A parse tree describes a  Therefore, this phase uses context-
syntactic structure. free grammar (CFG), which is
recognized by push-down automata
Syntax Analyzer
• New val:= old val+12
Syntax Analyzer
•The syntax of a language is specified by a context free grammar (CFG).
•CFG is a helpful tool in describing the syntax of programming languages.
•The rules in a CFG are mostly recursive.
•A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not.
•If it satisfies, the syntax analyzer creates a parse tree for the given program
Example
We take the problem of palindrome language, which cannot be described by means of Regular Expression.
That is, L = { w | w = wR } is not a regular language. But it can be described by means of CFG, as illustrated below:
G = ( V, Σ, P, S )
Where:
V = { Q, Z, N }
Σ = { 0, 1 }
P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1 }
S={Q}
This grammar describes palindrome language, such as: 1001, 11100111, 00100, 1010101, 11111, etc.
Syntax Analyzer
Syntax Analyzer versus Lexical Analyzer

Which constructs of a program should be recognized by the lexical analyzer,


and which ones by the syntax analyzer?
 Both of them do similar things; But the lexical analyzer deals with simple
non-recursive constructs of the language.
 The syntax analyzer deals with recursive constructs of the language.
 The lexical analyzer simplifies the job of the syntax analyzer.
 The lexical analyzer recognizes the smallest meaningful units (tokens) in a
source program.
 The syntax analyzer works on the smallest meaningful units (tokens) in a
source program to recognize meaningful structures in our programming
language.
Syntax Analyzer versus Lexical Analyzer
Parse Tree
• A parse tree is a graphical depiction of a derivation.
• It is convenient to see how strings are derived from the start symbol.
• The start symbol of the derivation becomes the root of the parse tree.
• In a parse tree:
• All leaf nodes are terminals.
• All interior nodes are non-terminals.
• In-order traversal gives original input string.
• Note: A parse tree depicts associativity and precedence of operators.
The deepest sub-tree is traversed first, therefore the operator in that sub-tree gets precedence
over the operator which is in the parent nodes.
Why Grammar becomes Ambiguous ?
• Causes such as left recursion, common prefixes
etc makes the grammar ambiguous.
• The removal of these causes may convert the
grammar into unambiguous grammar.
• However, it is not always compulsory.
Ambiguous Grammar : Since it has create more than one parse tree for same string generation(id+id*id

Option1

Option2

Here we have not taken care any associativity and precedence of the operator so it generated ambiguity
Here Parse tree 2 has violated the
associativity rule so it is wrong
Generally +, -, * follows the left associativity
but here in case of 2 the evaluation f parse
tree yields right associativity

2 Problem is: Associativity Not Taken care


1

Here Parse tree 1 has violated the precedence rule so it is


wrong
Generally * has higher precedence than + but here we
applied + operation before * , so its wrong
1 2 Note: High precedence operators should be in lower level of
parse tree but here * operator is in higher level than +
Problem is: Precedence Not Taken care
Methods To Remove Ambiguity
• The ambiguity from the grammar may be removed using the
following methods-
1. By fixing the grammar
2. By adding grouping rules
3. By using semantics and choosing the parse that
makes the most sense
4. By adding the precedence rules or other context
sensitive parsing rules
Removal of Ambiguity
1. Follow Associativity
2. Precedence constraints
Removing Ambiguity By Precedence & Associativity Rules-
• An ambiguous grammar may be converted into an unambiguous
grammar by implementing-
• Precedence Constraints
• The precedence constraint is implemented using the following rules-
• The level at which the production is present defines the priority of the operator
contained in it.
• The higher the level of the production, the lower the priority of operator.
• The lower the level of the production, the higher the priority of operator.
• Associativity Constraints
• If the operator is left associative, induce left recursion in its production.
• If the operator is right associative, induce right recursion in its production.

e.g.
• +,-,* are left associative --- so the grammar should have to be in the form
of left recursive form to remove ambiguity
• exponentiation has right associativity so the grammar should be in
right recursive form
Associativity rule Violation and Solution in CFGs
This is ambiguous grammar
Modified Grammar- This grammar is not
ambiguous
Since + and * are left associative in nature
so we design the grammar in the same
way left recursive form, which solved the
problem of ambiguity

We should restrict the growth of the parse tree in left


direction, which always justifies the left associativity of the +
operator

Here EE+id is a left recursive production.


To satisfy left associativity we will use left recursive production rues. Similarly to ensure right associativity we use right
recursive production rules
Precedence rule Violation and Solution in CFGs
This is ambiguous grammar
Modified Grammar- This grammar is not
ambiguous
Since + and * are right associative in nature so
we design the grammar in the same way left
recursive form, which solved the problem of
ambiguity

Here Precedence is preserved by modified grammar


In the above example both associativity and precedence rules are preserved by given grammar. + and * are left associative
So there is grammar production rules EE+T and TT* F both are left recursive form .
Similarly, for exponentiation operation is right associative so production FG^F is in right recursive form.
In the same way, If you derive any sting from the given grammar then precedence rules also be satisfied because ^ will be
derived at the lowest level of the parse tree than derivation of *, in the same way, * will be derived at the lower level of parse
tree than derivation of + .
Problem of Left Recursion

No Infinite loop
Infinite loop problem
problem because of
because of
Here we can check the
Recursive call if A()
condition using value
continuously without
of α
checking any condition
Non-Deterministic CFGs Vs Deterministic CFG

Deterministic CFG
Non Deterministic CFG
Backtracking Problem is may arise here
Here from A there is non deterministic move on
terminal α. If we have to derive αβ3 , there might
be three possibility of move
Parser
• A parser is a program that generates a parse tree for the given string ,
if the string is generated from the underlying grammar
Parser
• Construction of Parse tree starts from root and • Construction of Parse tree starts from bottom
proceed to child (String is derived from Non and proceed to root (start from terminal)
Terminals) • Shift Reduce Parser (SR parser)
• Decision is : What to Use ?? Which production • Shift (Push) , Reduce (POP) - Stack is used
to be used • Decision is : When to reduce ??
• Uses LMD • Follow RMD in reverse Order
• Less Powerful Parser
• If there is a multiple choice problem may arise
(Backtracking is needs to be done – which is
problem)
• Scans the strings by parser left to right one
symbol at a time
Top down parser
• In order to construct TDP the CFG should not have
• Left Recursion
• Non-determinism
• Ambiguity
Top Down Parsing : Recursive Descent Parsing
• Recursive descent is a top-down parsing technique that constructs the
parse tree from the top and the input is read from left to right.
• Built from a set of mutually recursive procedures. It uses
procedures/functions for every non-terminal entity.
• Each procedures implements one non terminal of the gramar
• This parsing technique recursively parses the input to make a parse
tree, which may or may not require back-tracking.
• But the grammar associated with it (if not left factored) cannot avoid
back-tracking.
• This parsing technique is regarded recursive as it uses context-free
grammar which is recursive in nature
• A form of recursive-descent parsing that [ does not require any back-
tracking is known as predictive parsing].
Top Down Parsing : Recursive Descent Parsing
It is Left Recursive form of grammar since TDP
cant parse grammar with Left recursive form

For EiE’
Back-tracking in Top down parsing
• Top- down parsers start from the root node (start symbol) and match the
input string against the production rules to replace them (if matched).
• To understand this, take the following example of CFG:
• S → rXd | rZd , X → oa | ea, Z → ai
• For an input string: “read”, a top-down parser, will behave like this:
Predictive Parser - LL(1)
• Predictive parser is a recursive descent parser, which has
the capability to predict which production is to be used
to replace the input string.
• The predictive parser does not suffer from backtracking.
• To accomplish its tasks, the predictive parser uses a look-
ahead pointer, which points to the next input symbols.
• To make the parser back-tracking free, the predictive
parser puts some constraints on the grammar and
accepts only a class of grammar known as LL(k) grammar.
• Predictive parsing uses a stack and a parsing table to
parse the input and generate a parse tree. Both the stack
and the input contains an end symbol $ to denote that
the stack is empty and the input is consumed. The parser
refers to the parsing table to take any decision on the
input and stack element combination.
• In recursive descent parsing, the parser may have more
than one production to choose from for a single instance
of input, whereas in predictive parser, each step has at
most one production to choose
Predictive Parser –LL(1)

LL(1)

LL(1)
LL(1) Parser –Predictive Parser
• Non Recursive Descent -- LL(1)

L- Scan string from left to right,


L-Follow LMD 1- No. of look ahead (only one symbol is
To look ahead)

Two Functions are used :


1) FIRST()
2) FOLLOW()
Rules for construct a Predictive - LL(1)
• Calculate first and follow
• FIRST() / Leading function
• FOLLOW() / Trailing Function
• Predictive Parsing table by using first and follow
• Parse the input string with the help of the table
• E.g. Construction of Predictive – LL(1)
• First , follow
• Parse table
• Stack implementation
• Parse the input string
Rules for construct a Predictive - LL(1)
• First and follow : First and follow sets are needed so that the parser
can apply properly the needed production rule at the correct position
• FIRST Function
• First(x) is a set of terminal symbol that begins in the string derived from x,
• First terminal string that can generated by the production
• E.g. A abc/def/ghi then If SaABCD, Ab, Bc, Cd, Dε then
• FIRST(A)={a,d,g} First(S)=a
FIRST(A)=b,
• Rules: First(B)=c
1. For a production Xe, First(X)=e First (C ) =d
First(D)=ε
2. For any terminal symbol a, FIRST(a)={a}
3. For a production rule XY1Y2Y3, FIRST(X)
1. if e € FIRST(Y1) then FIRST(X)= FIRST(Y1)
2. If e Є FIRST(Y1) then FIRST(X)= { FIRST(Y1)-e} U FIRST(Y2Y3)
Rules for construct a Predictive - LL(1)
• FOLLOW Function
• Follow(X) is a set of terminal symbol that appear immediately to the right
of X.
SaAc SABCD {$}
Ab/e {c}
Abd , then FOLLOW(A)=c, FOLLOW(S)=$
BC {d}
• Rule
Cd {e}
• For the start symbol S, place $ in the FOLLOW(S)
De {$}

• For any production rule AaB, FOLLOW(B)=FOLLOW(A)

• For any production rule AaBb


• If e € FIRST(b) then FOLLOW(B)=FIRST(b)
• If e Є FIRST (b) then FOLLOW(B)={FIRST(b)-e} U FOLLOW(A)
Rules for construct a Predictive - LL(1)
Grammar FIRST FOLLOW
SABCDE {a,b,c} {$}
Aa/ε {a, ε} {b,c}
Bb/ε {b, ε} {c}
Cc {c} {d,e,$}
Dd/ε {d, ε} {e,$}
Ee/ε {e, ε} {$}

Grammar FIRST FOLLOW


SBb/Cd {a,b,c,d} $
BaB/ε {a, ε} {b}
CcC/ε {c, ε} {d}
Rules for construct a Predictive - LL(1)
Grammar FIRST FOLLOW
ETE’ {id, ( } {$ , )}
E’+TE’/ε {+, ε } {$,) }
TFT’ {id,( } {+,$,) }
T’*FT’/ε {*, ε } {+,$, ) }
Fid/(E) { id,( } {*, +,$,) }

Grammar FIRST FOLLOW


SACB/CbB/Ba {d,g, ε,h,b,a} {$}
Ada/BC {d,g, h,ε} {h,g,$}
Bg/ε {ε,g} {$,a,h,g}
Ch/ε {ε,h} {g,$,b,h}
Identify the either given grammar can be parsed by Predictive -
LL(1) or not?
Grammar FIRST FOLLOW
SAB {a,b, ε} {$ }
Aa/ε {a, ε } {$,b }
Bb/ε {b, ε } {$}

Parse Table for LL)1) Parser

a b $
S SAB SAB AAB
A {AA Aε Aε
B -- Bb Bε

Since Single Entry in parse table so the given grammar can be parse by LL(1) parser
Identify the either given grammar can be parsed by Predictive -
LL(1) or not?
Grammar FIRST FOLLOW
SaSA {a,ε} {$,c }
Ac/ε {c, ε } {$,c }

Parse Table for LL)1) Parser

a c $
S SaSA Sε Sε
A --- Aε Aε
Ac

Since multiple Entry in parse table so the given grammar can not be parse by LL(1) parser
Parsing in LL(1) - Example
Bottom-Up Parser (SR Parser)
• Bottom-up parsing starts from the leaf nodes of a tree and works in upward
direction till it reaches the root node. Here, we start from a sentence and then
apply production rules in reverse manner in order to reach the start symbol.
• Shift-Reduce Parsing
• Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps
are known as shift-step and reduce-step.
• Shift step:
• The shift step refers to the advancement of the input pointer to the next input symbol, which is
called the shifted symbol.
• This symbol is pushed onto the stack. The shifted symbol is treated as a single node of the parse
tree.
• Reduce step :
• When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known as
reduce-step.
• This occurs when the top of the stack contains a handle. To reduce, a POP function is performed
on the stack which pops off the handle and replaces it with LHS non-terminal symbol.
Types of Bottom-up(SR) Parser
1. Operator Precedence Parser :
• Simple, but only a small class of grammars
Note: Ambiguous Grammar can be parsed only on this parser
2. LR Parser( Scans string from left to right / Reverse of the Rightmost
derivation ) CFG
• SLR(1) – Simple LR CLR

• LALR(1) – Look ahead LR LALR

• LR(0)
SLR
• CLR(1)
Bottom-Up Parser : LR Parser
• The LR parser is a non-recursive, shift-reduce, bottom-up parser.
• It uses a wide class of context-free grammar which makes it the most
efficient syntax analysis technique.
• LR parsers are also known as LR(k) parsers, where
• L stands for left-to-right scanning of the input stream;
• R stands for the construction of right-most derivation in reverse, and
• k denotes the number of lookahead symbols to make decisions.
Bottom-Up Parser : LR Parser
• There are three widely used algorithms available for constructing an LR
parser:
• SLR(1) – Simple LR Parser:
• Works on smallest class of grammar
• Few number of states, hence very small table
• Simple and fast construction
• LR(1) – LR Parser:
• Works on complete set of LR(1) Grammar
• Generates large table and large number of states
• Slow construction
• LALR(1) – Look-Ahead LR Parser:
• Works on intermediate size of grammar
• Number of states are same as in SLR(1)
Bottom-Up Parser : Operator Precedence Parser
• For a small class of CFG’s , the principles of operator precedence can be
used to build a simple bottom up parser
AB  Not Operator Grammar
• A CFG which have the following operation A+B Operator Precedence
• No RHS of any production has an empty (Є) A*B Operator Precedence
A/B Operator Precedence
• No two non terminals are adjacent
• Such grammar is called operator precedence grammar or simply
operator grammar
• A Shift reduce parser can easily be constructed for this kind of grammar
and it is called an operator precedence parser
Operator Precedence Parser
1. EE+E/ E*E /id – operator Precedence Grammar
2. EEAE/id
A+/* Not an Operator Grammar AB  Not Operator Grammar
A+B Operator Precedence
A*B Operator Precedence
A/B Operator Precedence

But It can be Convert to Operator Grammar as


E E+E/E*E/id
A+/*
Operator Precedence Parser

Input String

Operator Grammar
Operator Parser

Parser Tree
Operator Precedence Parser: Parsing Acting
• Both end of the given input string , add the $ symbol
• Now Scan the input string from left to right until the > is encountered
• Scan towards left over all the equal precedence until the first left
most < is encountered
• Everything between the left most < and right most > is handle
• $ on $ means parsing is successful
Operator Precedence Parser

Eg. With the help of following grammar parse the input string
“id+id*id”
TT+T/T*T/id
Basics:
Steps to solve + * id $ • Id, a,b,c  High
1. Check Operator grammar or not + > < < > • $low
* > > < > • +>+
2. Operator Precedence Relation Table id • *>*
> > - > • id≠ id
3. Parse the given string $ < < < A • $A $

4. Generates Parse tree

Ameans Accepted
Stack Rela Input Comment
tion
$ < Id+id*id$ Shift id
$id > +id*id$ Reduce TId
$T < +id*id$ Shift +
$T+ < Id*id$ Shift id
$T+id > *id$ Reduce Tid
$T+T < *id$ Shift *
$T+T* < id$ Shift id
$T+T*id > $ Reduce Tid
$T+T*T > $ Reduce TT*T
$T+T > $ Reduce TT+T
$T A $
If relation is < then shift
if relation is > then reduce
LL Vs LR Parser
LL LR
Does a leftmost derivation. Does a rightmost derivation in reverse.
Starts with the root nonterminal on Ends with the root nonterminal on the stack.
the stack.
Ends when the stack is empty. Starts with an empty stack.
Uses the stack for designating what is Uses the stack for designating what is already seen.
still to be expected.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off Tries to recognize a right hand side on the stack, pops it, and
the stack, and pushes the pushes the corresponding nonterminal.
corresponding right hand side.
Expands the non-terminals. Reduces the non-terminals.
Reads the terminals when it pops one Reads the terminals while it pushes them on the stack.
off the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree
Semantic Analyzer
• It checks whether the parse tree constructed follows the rules of
language . E.g. assignment of values is between compatible data
types or not ?
• Also keeps track of identifier, their type and expression.
• A semantic analyzer checks the source program for semantic errors
and collects the type information for the code generation.
• Semantics of a language provide meaning to its constructs, like tokens
and syntax structure. Semantics help interpret symbols, their types,
and their relations with each other.
• Semantic analysis judges whether the syntax structure constructed in
the source program derives any meaning or not.
Semantic Analyzer
• Type-checking is an important part of semantic analyzer.
• Normally semantic information cannot be represented by a context-
free language used in syntax analyzers.
• Context-free grammars used in the syntax analysis are integrated with
attributes (semantic rules)
• the result is a syntax-directed translation,
• Attribute grammars
Semantic Analyzer
E.g. a: int
int a ; sum: double
double sum b: char
char b
sum=a+b This is incorrect : data type mis match
This is syntactically correct but semantically incorrect

For example:
int a = “ value”;
• should not issue an error in lexical and syntax analysis phase, as it is lexically and
structurally correct, but it should generate a semantic error as the type of the assignment
differs.
• These rules are set by the grammar of the language and evaluated in semantic analysis.
The following tasks should be performed in semantic analysis:
•Scope resolution
•Type checking
•Array-bound checking
Type checking and its types
• Type checking is the process of verifying that each operation executed
in a program respects the type system of the language.
• This generally means that all operands in any expression are of
appropriate types and number.
• Semantic Checks
• Static – done during compilation
• Dynamic – done during run-time
• Process of designing a type checker
• Identify the types that are available in the language
• Identify the language constructs that have types associated with them
• Identify the semantic rules for the language
Type checking and its types
• Static type checking Dynamic type checking
• Static type checking is done o Implemented by including
at compile-time. type information for each
• Obtained via declarations
and stored in a master data location at runtime.
symbol table. o For example, a variable of
• After this information is
collected, the types type double would contain
involved in each operation both the actual double
are checked.
value and some kind of tag
• Example of Static Checks:
• Type Checks indicating "double type".
• Flow of Control Checks
• Uniqueness Checks
• Name-related Checks
Type Systems
• Strongly Typed Vs Weakly Typed System
• Collection of rules for assigning type expressions
• A sound type system eliminates run-time type checking for type errors.
• A programming language is strongly-typed, if every program its compiler accepts will
execute without type errors.

• Uses of type checking


• Assignments: When a variable is given a value by an assignment, it must be verified
that the type of the value is the same as the declared type of the variable.
• Overloading: The same name is used for several different operations over several
different types.
• Polymorphism types: Some languages allow a function to be poly-morphic, that is,
to be defined over a large class of similar types, e.g. over all arrays no matter what
the types of the elements are.
• Data structures: A data structure may define a value with several components, or a
value that may be of different types at different times.
Type Systems
Type Systems
Type Checking of Statements
Type Conversion
• The process of converting the value of one data type (integer, string,
float, etc.) to another data type is called type conversion.
• Python has two types of type conversion.
• Implicit Type Conversion-
• conversion of lower datatype (integer) to higher data type (float) to avoid data loss.
• Automatically by Compiler , automatically converts one data type to another data type
• No User Programmer involvement
• No data Loss
• Explicit Type Conversion
• Also called typecasting because the user casts (change) the data type of the
objects
• users convert the data type of an object to required data type
• User Defined
• Use of Pre Built Classes
Syntax Directed Definitions/ Translations
• Syntax-directed translation (SDT) refers to a method of compiler
implementation where the source language translation is completely
driven by the parser.
• The parsing process and parse trees are used to direct semantic
analysis and the translation of the source program.
• We can augment grammar with information to control the semantic
analysis and translation. Such grammars are called attribute
grammars.
• We can associate information with a language construct by attaching
attributes to the grammar symbols.
• A syntax directed definition specifies the values of attributes by
associating semantic rules with the grammar productions
Syntax Directed Definitions/ Translations
• A SDD is a context free grammar with attributes and semantic rules
(SDD=CFG+ Semantic Rules)
• Attributes are associated with grammar symbols and semantic rules
are associated with productions
• Attributes may be of many kinds: numbers, types, table references,
strings, etc.
• Example
SDD(Syntax Directed Definition)
• SDD= CFG + Semantic Rules
• A SDD is a CFG together with semantic rules
• Attributes are associated with grammar Symbol
• A.val=B.val+ C.val
Syntax Directed Definitions/ Translations
• The general approach to syntax-directed translation is to construct a
parse tree or syntax tree and compute the values of attributes at the
nodes of the tree by visiting them in some order
• SDDs are highly readable and give high-level specifications for
translations. But they hide many implementation details. For
example, they do not specify order of evaluation of semantic actions.
• Syntax-Directed Translation Schemes (SDT) embeds program
fragments called semantic actions within production bodies. SDTs are
more efficient than SDDs as they indicate the order of evaluation of
semantic actions associated with a production rule.
Syntax Directed Definitions/ Translations
• Attributes can be of two types:
• In Syntax Directed Definition, two attributes are used one is Synthesized
attribute and another is inherited attribute.
• An attribute is said to be synthesized attribute if its parse tree node value is
determined by the attribute value at child nodes
• An attribute is said to be Inherited attribute if its parse tree node value is
determined by the attribute value at parent and/or siblings node.
Syntax Directed Definitions/ Translations
• Synthesized attributes
• A synthesized attribute at node N is defined only in terms of attribute
values of children of N and at N it
• For eg. let’s say A  BC is a production of a grammar, and A’s attribute is
dependent on B’s attributes or C’s attributes then it will be synthesized
attribute.
• Inherited attributes
• An inherited attribute at node N is defined only in terms of attribute
values at N’s parent, N itself and N’s siblings
• For example, let’s say A  BC is a production of a grammar and B’s
attribute is dependent on A’s attributes or C’s attributes then it will be
inherited attribute.
Types of attributes in SDD
• Synthesized Attribute
• If a node takes value from its children
• ABCD , here A is parent node and B, C,D are child node
• A.val=B.Val , A.val=C.val, A.val=D.Val
• Here Parent A is taking its value from child
• Inherited Attribute
• If a node takes value form its parent or its siblings
• ABCD
• C.i = A.i
• C.i=B.i
• C.i=D.i
S.NO Synthesized Attributes Inherited Attributes
1. An attribute is said to be Synthesized An attribute is said to be Inherited attribute
attribute if its parse tree node value is if its parse tree node value is determined by
determined by the attribute value at child the attribute value at parent and/or siblings
nodes. node.
2. The production must have non-terminal as The production must have non-terminal as a
its head. symbol in its body.
3. A synthesized attribute at node n is defined A Inherited attribute at node n is defined
only in terms of attribute values at the only in terms of attribute values of n’s
children of n itself. parent, n itself, and n’s siblings.
4. It can be evaluated during a single bottom- It can be evaluated during a single top-down
up traversal of parse tree. and sideways traversal of parse tree.
5. Synthesized attributes can be contained by Inherited attributes can’t be contained by
both the terminals or non-terminals. both, It is only contained by non-terminals.
6. Synthesized attribute is used by both S- Inherited attribute is used by only L-
attributed SDT and L-attributed STD. attributed SDT.
7.
Types of SDD
• S- Attributed SDD/ S-attributed Grammar
• A SDD that takes only synthesized attributes
• ABCD , A.s= B.s, A.s=C.s, A.s=D.s
• Semantic actions are always placed at right end of the productions .
• It is also called as postfix SDD
• Attributes are evaluated with bottom-up parsing

• L-Attributed SDD / L-Attributed Grammar


• A SDD that takes both synthesized and inherited attributes but each inherited
attribute is restricted to inherit form parent or left siblings
• AXYZ
• Y.s=A.s (Ok)
• Y.s=X.s (Ok)
• Y.s=Z.s ( Its not ok)
• Semantic actions are placed anywhere in the RHS
• Attributes are evaluated traversing parse tree depth first , left to right order
• Top down Parsing
Identify Either Given SDD is L-Attributed or not

• ALM { L.i=A.i; M.i=L.s ; A.s=M.s; }


• AQR { R.i=A.i; Q.i=R.s, A.s=Q.s }

• L.i=A.i inherited , M.i=L.s  Inherited , A.s=M.s  synthesized


• R.i=A.i Inherited , Q.i=R.s  ?? , A.s=Q.s  synthesized

• Here first one is L-Attributed SDD but Second one is not because in
Q.i=R.s , Q taking its value from it right sibling
S-attributed and L-attributed SDT.
• S-attributed SDT :
• If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
• S-attributed SDTs are evaluated in bottom-up parsing, as the values of the
parent nodes depend upon the values of the child nodes.
• Semantic actions are placed in rightmost place of RHS.
• L-attributed SDT:
• If an SDT uses both synthesized attributes and inherited attributes with a
restriction that inherited attribute can inherit values from left siblings only, it
is called as L-attributed SDT.
• Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right
parsing manner.
• Semantic actions are placed anywhere in RHS.
S-attributed and L-attributed SDT.
• For example,
A  XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S}
is not an L-attributed grammar since Y.S = A.S and Y.S = X.S are allowed
but Y.S = Z.S violates the L-attributed SDT definition as attributed is
inheriting the value from its right sibling.
If a definition is S-attributed, then it is also L-attributed but NOT vice-
versa.
S-attributed and L-attributed SDT.
• Example – Consider the given below SDT.
P1: S  MN {S.val= M.val + N.val}
P2: M  PQ {M.val = P.val * Q.val and P.val =Q.val}

• Select the correct option.


A. Both P1 and P2 are S attributed.
B. P1 is S attributed and P2 is L-attributed.
C. P1 is L attributed but P2 is not L-attributed.
D. None of the above

• Explanation –
The correct answer is option C as, In P1, S is a synthesized attribute and in
L-attribute definition synthesized is allowed.
• So P1 follows the L-attributed definition. But P2 doesn’t follow L-attributed
definition as P is depending on Q which is RHS to it.
Example of S-attributed SDD
Syntax Directed Translation
• Syntax Directed Translation are augmented rules to the grammar that
facilitate semantic analysis.
• SDT involves passing information bottom-up and/or top-down the
parse tree in form of attributes attached to the nodes.
• Syntax directed translation rules use
• 1) lexical values of nodes,
• 2) constants &
• 3) attributes associated to the non-terminals in their definitions.
• The general approach to Syntax-Directed Translation is to construct a
parse tree or syntax tree and compute the values of attributes at the
nodes of the tree by visiting them in some order.
• In many cases, translation can be done during parsing without
building an explicit tree.
Applications of SDT(Syntax Directed Translation)
• Executing Arithmetic Expression
• Conversion from infix to postfix
• Conversion form infix to prefix
• Conversion from binary to decimal
• Counting number of reductions
• Creating syntax tree
• Generating intermediate code
• Type Checking
• Storing type info into symbol table
Syntax Directed Translation
Example
E  E+T | T
T  T*F | F
F  INTLIT

This is a grammar to syntactically validate an expression having additions


and multiplications in it.
Now, to carry out semantic analysis we will augment SDT rules to this
grammar, in order to pass some information up the parse tree and check for
semantic errors, if any.
In this example we will focus on evaluation of the given expression, as we
don’t have any semantic assertions to check in this very basic example.
Syntax Directed Translation
E  E+T { E.val = E.val + T.val } PR#1
ET { E.val = T.val } PR#2
T  T*F { T.val = T.val * F.val } PR#3
TF { T.val = F.val } PR#4
F  INTLIT { F.val = INTLIT.lexval } PR#5

SDD
CFG
Syntax Directed Translation
• Let’s take a string to see how semantic
analysis happens –
• S = 2+3*4. Parse tree corresponding to
S would be

 To evaluate translation rules, we can employ


one depth first search traversal on the parse
tree.
 This is possible only because SDT rules don’t
impose any specific order on evaluation until
children attributes are computed before
parents for a grammar having all synthesized
attributes.
 Otherwise, we would have to figure out the
best suited plan to traverse through the parse
tree and evaluate all the attributes in one or
more traversals.
Syntax Directed Translation
 Diagram shows how semantic
analysis could happen. The flow of
information happens bottom-up
and all the children attributes are
computed before parents, as
discussed above.
 Right hand side nodes are
sometimes annotated with
subscript 1 to distinguish between
children and parent.
Syntax Directed Translation
Syntax Directed Translation
• Grammar + Semantic rules= SDT

• SDT for evaluation of expression


• EE+T { E.value=E.value+ T.Value}
/ T { E.value =T.value }
• TT*F { T.value=T.value* F. Value }
/ F { T.value=F.value }
• F num { F.value=num.value }
Syntax Directed Translation : Top-Down Parsing

SDT

EE+ T { Printf(“+”);} -1
/T { } -2

TT* F { printf(“*”);} -3
/ F { } -4
Fnum {printf(num.lval);} -5

It Gives a Postfix Notation of The Expression


2+3*4 = 234*+
Syntax Directed Translation : Top-Down Parsing

It Gives a Postfix Notation of The


Expression
2+3*4 = 234*+
Syntax Directed Translation : Bottom-up Parsing

in bottom up parsing we reduce the production Like num is


reduced to F ---> Which gives 2 as output Then F reduced to Output of this Translation is : 234*+

T ---> which is not gives any output T is reduced to E --> no output


and so on
Syntax Directed Translation : Bottom-up Parsing

Output of this Translation is : 23131


Directed Acyclic Graph
• Used to identify common sub expressions
• Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to
see the flow of values flowing among the basic blocks, and offers optimization too.
• To apply an optimization technique to a basic block, a DAG is a three-address code that is
generated as the result of an intermediate code generation.
• It demonstrates how the statement’s computed value is used in subsequent statements.
• DAG provides easy transformation on basic blocks.
• DAG can be understood here:
 Leaf nodes represent identifiers, names or constants.
 Interior nodes represent operators.
 Interior nodes also represent the results of expressions or the identifiers/name where the values are to be
stored or assigned.
Directed Acyclic Graph
• t0 = a + b
• t1 = t0 + c
• d = t0 + t1
Directed Acyclic Graph

Given Expression : (a+a*(b-c))+ ((b-c)*d)

The Three Address code of above expression


t1=b-c
t2=a*t1
t3=a+t2
t4=t1*d
t5=t3+t4
Directed Acyclic Graph
• Consider a basic block
a = b+c
a0 = b0 + c0
d= b+a d0 = b0+a0
e= d+a e0 = do+ a0
Directed Acyclic Graph
Directed Acyclic Graph
Directed Acyclic Graph

Here do you think


a=b+c , e=b+c
Have common sub expression
b+c ??
Todays Topic
• Intermediate Code generation
• Run time Memory Allocation
Intermediate Code Generation/ Representation
• A compiler may produce an explicit intermediate codes representing the
source program.
• These intermediate codes are generally machine (architecture
independent). But the level of intermediate codes is close to the level of
machine codes.
• If a compiler translates the source language to its target machine language
without having the option for generating intermediate code, then for each
new machine, a full native compiler is required.
• Intermediate code eliminates the need of a new full compiler for every
unique machine by keeping the analysis portion same for all the compilers.
• The second part of compiler, synthesis, is changed according to the target
machine.
• It becomes easier to apply the source code modifications to improve code
performance by applying code optimization techniques on the
intermediate code.
Intermediate Code Generation/ Representation
Intermediate codes can be represented in a
variety of ways and they have their own benefits.
 High Level IR –
o High-level intermediate code representation
is very close to the source language itself.
o They can be easily generated from the source
code and we can easily apply code
modifications to enhance performance.
Intermediate code can be either o But for target machine optimization, it is less
o language specific (e.g., Byte Code for Java) , P- preferred.
Code for Pascal or  Low Level IR –
o language independent (three-address code). o This one is close to the target machine, which
makes it suitable for register and memory
allocation, instruction set selection, etc.
Method/Tools Used to represent intermediate code : o It is good for machine-dependent
o Abstract Syntax Tree optimizations.
o Postfix notation
o Directed Acyclic Graph
o Three address code
Three-Address Code
• Intermediate code generator receives input from its predecessor phase, semantic
analyzer, in the form of an annotated syntax tree.
• That syntax tree then can be converted into a linear representation, e.g., postfix
notation.
• Intermediate code tends to be machine independent code.
• Therefore, code generator assumes to have unlimited number of memory storage
(register) to generate code.
• For example:
a = b + c * d;
• The intermediate code generator will try to divide this expression into sub-
expressions and then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
Three-Address Code
• r being used as registers in the target program.
• A three-address code has at most three address locations to calculate
the expression. A three-address code can be represented in two
forms : quadruples and triples.
• Quadruples
• Each instruction in quadruples presentation is divided into four fields:
operator, arg1, arg2, and result.
• The above example is represented below in quadruples format:

Op arg1 arg2 Result


[0] * c D r1
[1] + b r1 r2
[2] + r2 r1 r3
[3] = r3 a
Three-Address Code
• Triples
• Each instruction in triples presentation has three fields : op, arg1, and arg2.
• The results of respective sub-expressions are denoted by the position of
expression.
• Triples represent similarity with DAG and syntax tree.
• They are equivalent to DAG while representing expressions.
Op arg1 arg2
[0] * C d
[1] + B (0)
[2] + (1) (0)
[3] = (2)

• Triples face the problem of code immovability while optimization, as the results are positional and
changing the order or position of an expression may cause problems.
Three-Address Code
• Indirect Triples
• This representation is an enhancement over triples representation.
• It uses pointers instead of position to store results.
• This enables the optimizers to freely re-position the sub-expression to
produce an optimized code.
Code Optimizer
 Is a technique of which tries to improve the code by eliminating unnecessary code lines/block
and arranging the statements in such a sequence that speed up the program execution without
wasting the resources.
 The code optimizer optimizes the code produced by the intermediate code generator in the
terms of time and space.
 Optimization is a program transformation technique, which tries to improve the code by making
it consume less resources (i.e. CPU, Memory) and deliver high speed
 Advantages : Executes faster, Efficient memory usage, Yields better performance
 A code optimizing process must follow the three rules given below:
o The output code must not, in any way, change the meaning of the program.
o Optimization should increase the speed of the program and if possible, the program should
demand less number of resources.
o Optimization should itself be fast and should not delay the overall compiling process.
Code Optimizer
 Common steps in optimization may be
o Data Flow Analysis – Examine the program to find out certain properties of interest
o Code Optimization – changes the code based on data flow analysis information in a
way that improves performance
• Efforts for an optimized code can be made at various levels of compiling the process.
 At the beginning, users can change/rearrange the code or use better algorithms to
write the code.
 After generating intermediate code, the compiler can modify the intermediate code by
address calculations and improving loops.
 While producing the target machine code, the compiler can make use of memory
hierarchy and CPU registers.
Code Optimizer
• Optimization can be categorized broadly into two types :
 machine independent and
 machine dependent.
• Platform Dependent • Platform independent Techniques
Techniques • Loop Optimization
• Peephole • Loop Unrolling ,
• Code Movement ,
• Instruction Level parallelism
• Frequency reduction,
• Data Level Parallelism • loop jamming /fusion
• Cache optimization • Constant folding
• Redundant Resources • Constant propagation
• Common Subexpression Elimination
Machine-independent Optimization
 the compiler takes in the intermediate code and transforms a part of the code that does not involve any
CPU registers and/or absolute memory locations. For example:
do
{
item = 10;
value = value + item;
} while(value<100);
• This code involves repeated assignment of the identifier item, which if we put this way:

Item = 10;
do
{
value = value + item;
} while(value<100);
• should not only save the CPU cycles, but can be used on any processor.
Machine-dependent Optimization
 Machine-dependent optimization is done after the target code has
been generated and when the code is transformed according to the
target machine architecture.
 It involves CPU registers and may have absolute memory references
rather than relative references.
 Machine-dependent optimizers put efforts to take maximum
advantage of memory hierarchy.
 Must have knowledge about machine architecture
Basic Blocks
 Source codes generally have a number of instructions, which are always executed in sequence and are
considered as the basic blocks of the code.
 These basic blocks do not have any jump statements among them, i.e., when the first instruction is
executed, all the instructions in the same basic block will be executed in their sequence of appearance
without losing the flow control of the program.
 A program can have various constructs as basic blocks, like IF-THEN-ELSE, SWITCH-CASE conditional
statements and loops such as DO-WHILE, FOR, and REPEAT-UNTIL, etc.
 Basic blocks are important concepts from both code generation and optimization point of view.

 Basic blocks play an important role in identifying variables,


which are being used more than once in a single basic
block. If any variable is being used more than once, the
register memory allocated to that variable need not be
emptied unless the block finishes execution.
Control Flow Graph
 Basic blocks in a program can be
represented by means of control
flow graphs.
 A control flow graph depicts
how the program control is
being passed among the blocks.
 It is a useful tool that helps in
optimization by help locating
any unwanted loops in the
program.
Common Code optimization Techniques
o Common sub-expression elimination
o Compile time evaluation
 Constant propagation
 Constant Folding
o Code movement
o Code motion(frequency reduction)
o Dead code elimination
o Strength reduction
Loop Jamming / fusion Loop Unrolling
for(i=0;i<5;i++)
int I, a[101], b[100] int I, a[101], b[100] printf(“a”)
for (i=0; i<100; i++ for (i=0; i<100; i++)
a[i]=1; {
for (i=0; i<100;i++ a[i]=1; printf(“a)
b[i]=2; b[i]=2; printf(“a)
} printf(“a)
printf(“a)
Code motion / frequency reduction printf(“a)
a=100; a=100;
while(a>0) X=y+z;
{ while(a>0)
X=y+z; {
If(a%x==0) If(a%x==0)
printf(“%d”,a) printf(“%d”,a)
} }
Optimization techniques for Basic Blocks
o Common sub-expression elimination
o Dead code elimination
o Renaming temporary variables
o Interchange of statements
o Algebraic transformations
Peephole Optimization
o This is a technique applied to improve the performance of program by
examining a short sequence of instructions in a window (peephole) and
reduce the instructions by faster and short sequence of instructions.
o Can be used to optimize either intermediate or object code
o Machine Dependent optimization
o Optimization Techniques used in peephole
 Redundant instruction elimination (load and store instruction )
 Removal of unreachable code
 Flow of control optimization (Unnecessary jumps reduction)
 Algebraic simplification
 Machine idioms
Redundant instruction elimination

int add_ten(int x) int add_ten(int x) int add_ten(int x) int add_ten(int x)


{ { { int y = 10; { return x + 10; }
int y, z; int y;
y = 10; y = 10; return x + y;
z = x + y; y = x + y;
return z; return y; }
} }
1. Which of the following translation program converts assembly
language programs to object program?
A. Assembler
B. Compiler
C. Macroproessor
D. Linker

2. Type checking is normally done during


A. Lexical Analysis
B. Syntax Analysis
C. Syntax directed translation
D. Code optimization
• Which of the following is /are language processor
a. Assembler
b. Compiler
c. Interpreter
d. All of the above

• Translator for the low level programming language is


termed as
a. Assembler
b. Compiler
c. Loader
d. Linker
• Which of the following is used for
• The output of lexical analyzer is grouping of characters into tokens
a. Machine code a. Parser
b. Code optimizer
b. Intermediate code c. Code scanner
c. A stream of tokens d. Code generator
d. A parse tree
Ans -C
• Match the following according to input(left column) to the phase(right
column) that process it
P) Syntax Tree (i) Code Generator
Q) Character Stream (ii) Syntax Analyzer
R) Intermediate representation (iii) Semantic Analyzer
S) Token Stream (iv) Lexical Analyzer

A) P(ii), Q(iii) , R(iv) , S(i)


B) P(ii), Q(i) , R(iii) , S(iv)
C) P(iii), Q(iv) , R(i) , S(ii)
D) P(i), Q(iv) , R(ii) , S(iii)
Counting the tokens
int max(int i); - How many tokens are there ?
Ans: 7

int main()
int main()
{
{
int a=10,b=20;
// two variables are declared
printf(“sum is =%d”,a+b);
int a,b;
return 0;
a=10;
}
return 0;
How many tokens:
}
a) 33
How many tokens:
b)27
a) 18
c) 35
b)19
d) 23
c) 23
d) 24
Ans- b
• Consider the grammar defined by the following rules with two operators * and +
ST*P
TU | T*U
PQ+P | Q
Qid
Uid
Which of the following is TRUE
a) + is left associative while * is right associative
b) + is right associative while * is left associative
c) Both are right associative
d) Both are left associative

From the grammar we can find out associative by looking at grammar. Let us consider the 2nd production T -> T * U
T is generating T*U recursively (left recursive) so * is
left associative.

Similarly
P -> Q + P Right recursion so + is right associative.
So option B is correct.
Ans -C
• Consider line number 3 of the following C- program.

int main ( ) { /* Line 1 */


int I, N; /* Line 2 */
fro (I = 0; I < N;I++); /* Line 3 */
}

Identify the compiler’s response about this line while creating the object-module
(A) No compilation error
(B) Only a lexical error
(C) Only syntactic errors
(D) Both lexical and syntactic errors
Ans -b
• Which of the following rues violate the requirement of an operator
grammar ? P, Q, R are non terminals and r,s,t are terminals
• i) PQR ii) PQsR iii) Pe iv) PQtRr

a) (i) only
b) (i) and (iii) only
c) (ii) and (iii) only
d) (iii) and (iv) only
• Which of the following statement is true?
A. SLR parser is more powerful than LALR
B. LALR parser is more powerful than Canonical LR Parser
C. Canonical LR parser is more powerful than LALR
D. The parsers SLR , Canonical LR and LALR have the same power

• Which of the following statements is false?


(A) An unambiguous grammar has same leftmost and rightmost derivation
(B) An LL(1) parser is a top-down parser
(C) LALR is more powerful than SLR
(D) An ambiguous grammar can never be LR(k) for any k

SLR< LALR< CLR


• The least number of temporary variables required to create a three address
code for the statement q+r/3+s-t*5+4*v/w is : 8
• t1=r/3
• t2=q+t1
• t3=t*5
• t4=s-t3
• t5=t2+t4
• t6=v/w
• t7=4*t6
• t8=t5+t7
Ans-8 variables are required
• The least number of temporary variables required to create a three address
code for the statement q+r/3+s-t*5+4*v/w is
• t1=r/3
• t2=q+t1
• t3=t*5
• t4=s-t3
• t5=t2+t4
• t6=v/w
• t7=4*t6
• t8=t5+t7
• The number of tokens in the • Consider the following grammar
• PxQRS
following C statement • Qyz| z
print(“i=%d,&i=%x”,i,&i); is • Rw | e
a) 3 • Sy

b) 26 What is the follow(Q)?


c) 10
a) {R}
d) 21 b) {w}
c) {w,y}
d) {w,$}
• Q1: In a compiler, keywords of a language are recognized during
• (A) parsing of the program
• (B) the code generation
• (C) the lexical analysis of the program
• (D) dataflow analysis
• Ans: (C) the lexical analysis of the program

Q2. The lexical analysis for a modern computer language such as Java needs the power of
which one of the following machine models in a necessary and sufficient sense?
• (A) Finite state automata
• (B) Deterministic pushdown automata
• (C) Non-Deterministic pushdown automata
• (D) Turing Machine
• Ans :(A) Finite state automata
• An intermediate code form is suitable for
• An intermediate code form is
• Reading
• Postfix Notation
• Debugging
• Syntax trees
• Testing
• Three Address codes
• Optimization
• All of these
• Intermediate code generator gets • A byte code is the intermediate
input from language for the
• Lexical Analyzer • C++
• Syntax Analyzer • Java
• Semantic Analyzer • Java Virtual Machine
• Error Handling • C
• In the following grammar Which of the
• Which one of the following is NOT following is true?
performed during compilation?
(A) Dynamic memory allocation
(B) Type checking
(C) Symbol table management a) + is left associative while '* ' is right
(D) Inline expansion associative
b) Both and '* ' are left associative
c) is right associative while ' *' is left
associative
d) None of the above

Correct answer is option 'A'.


Q. Heap allocation is required for languages.
a. that support recursion
b. that support dynamic data structure
c. that use dynamic scope rules
d. None of the above

You might also like