You are on page 1of 24

Chapter 1

Lexical Analysis and Parsing

LEARNING OBJECTIVES

 Language processing system  Bottom up parsing


 Lexical analysis  Conflicts
 Syntax analysis  Operator precedence grammar
 Context free grammars and ambiguity  LR parser
 Types of parsing  Canonical LR parser(CLR)
 Top down parsing

LanGuaGe processinG system High level


Compiler
Low level
program program
Language Processors (source program) (target program)

Interpreter Error messages

It is a computer program that executes instructions written in a


programming language. It either executes the source code directly Passes
or translates source code into some efficient intermediate represen-
tation and immediately executes this. The number of iterations to scan the source code, till to get the
executable code is called as a pass.
Source program Compiler is two pass. Single pass requires more memory and
Interpreter Output
Input multipass require less memory.

Example: Early versions of Lisp programming language, BASIC.


Analysis–synthesis model of compilation
Translator There are two parts of compilation:
A software system that converts the source code from one form of
the language to another form of language is called as translator. Compilation
There are 2 types of translators namely (1) Compiler (2) Assembler.
Compiler converts source code of high level language into low
level language. Analysis Synthesis
(front end) (back end)
Assembler converts assembly language code into binary code.

Compilers Analysis It breaks up the source program into pieces and creates
A compiler is a software that translates code written in high-level an intermediate representation of the source program. This is more
language (i.e., source language) into target language. language specific.
Example: source languages like C, Java, . . . etc. Compilers are
user friendly. Synthesis It constructs the desired target program from the inter-
The target language is like machine language, which is efficient mediate representation. The target program will be more machine
for hardware. specific, dealing with registers and memory locations.
6.4 | Unit 6  •  Compiler Design

Front end vs back end of a compiler Syntax analyzer or Parser


The front end includes all analysis phases and intermediate •• Tokens are grouped hierarchically into nested collections
code generator with part of code optimization. with collective meaning.
Error handler •• A context free grammar (CFG) specifies the rules or
productions for identifying constructs that are valid in
a programming language. The output is a parse/syntax/
Intermediate
Source Lexical Tokens Parser Syntax 3- derivation tree.
code
program analyzer tree adress
generator
code Example:  Parse tree for –(id + id) using the following
Symbol
grammar:
table E→E+E
E→E*E
The back end includes code optimization and code gen- E → –E (G1)
eration phases. The back end synthesizes the target program E → (E)
from intermediate code. E → id
E
Context of a compiler
In addition to a compiler, several other programs may be − E
required to create an executable target program, like pre-
processor to expand macros. ( E )
The target program created by a compiler may require E + E
further processing before it can be run.
The language processing system will be like this: id id

Source program with macros


Semantic analysis
Preprocessor •• It checks the source program for semantic errors.
Modified source program •• Type checking is done in this phase, where the compiler
checks that each operator has matching operands for
Compiler semantic consistency with the language definition.
•• Gathers the type information for the next phases.
Target assembly program
Example 1:  The bicycle rides the boy.
Assembler This statement has no meaning, but it is syntactically
Relocatable machine code correct.

Loader/linker
Library files, relocatable Example 2: 
object files int a;
Absolute machine code bool b;
char c;
Phases c = a + b;
Compilation process is partitioned into some subproceses We cannot add integer with a Boolean variable and assign it
called phases. to a character variable.
In order to translate a high level code to a machine code,
Intermediate code generation
we need to go phase by phase, with each phase doing a par-
ticular task and parsing out its output for the next phase. The intermediate representation should have two important
properties:
Lexical analysis or scanning (i) It should be easy to produce.
It is the first phase of a compiler. The lexical analyzer reads the (ii) Easy to translate into the target program
stream of characters making up the source program and groups
‘Three address code’ is one of the common forms of
the characters into meaningful sequences called lexemes.
Intermediate code.
Example:  Consider the statement: if (a < b) Three address code consists of a sequence of instruc-
In this sentence the tokens are if, (a, <, b,). tions, each of which has at most three operands.
Number of tokens = 6
Identifiers: a, b Example: 
Keywords: if id1 = id2 + id3 × 10;
Operators: <, ( , ) t1: = inttoreal(10)
Chapter 1  •  Lexical Analysis and Parsing  |  6.5

t2:= id3 × t1 in the source program. The stream of tokens is sent to the
t3:= id2 + t2 parser for syntax analysis.
id1 = t3
There will be interaction with the symbol table as well.

Source program
Code optimization
Lexical
The output of this phase will result in faster running analyzer
machine code.
Get
Example:  For the above intermediate code the optimized Error handler next Tokens Symbol table
tokens
code will be
t1:= id3 × 10.0 Parser
id1: = id2 + t1
In this we eliminated t2 and t3 registers.
Lexeme:  Sequence of characters in the source program
Code generation that matches the pattern for a token. It is the smallest logical
•• In this phase, the target code is generated. unit of a program.
•• Generally the target code can be either a relocatable
Example:  10, x, y, <, >, =
machine code or an assembly code.
•• Intermediate instructions are each translated into a Tokens:   These are the classes of similar lexemes.
sequence of machine instructions.
•• Assignment of registers will also be done. Example:  Operators: <, >, =
Identifiers: x, y
Example:  MOVF id3, R2
Constants: 10
  MULF ≠ 60.0, R2 Keywords: if, else, int
  MOVF id2, R1
  ADDF R2, R1
  MOVF R1, id1
Operations performed by lexical analyzer
1. Identification of lexemes and spelling check
Symbol table management 2. Stripping out comments and white space (blank, new
line, tab etc).
A symbol table is a data structure containing a record for
3. Correlating error messages generated by the compiler
each variable name, with fields for the attributes of the
with the source program.
name.
4. If the source program uses a macro-preprocessor, the
What is the use of a symbol table? expansion of macros may also be performed by lexical
1. To record the identifiers used in the source program. analyzer.
2. Its type and scope
3. If it is a procedure name then the number of argu- Example 1:  Take the following example from Fortran
ments, types of arguments, the method of parsing (by   DO 5 I = 1.25
reference) and the type returned.   Number of tokens = 5
  The 1st lexeme is the keyword DO
Error detection and reporting   Tokens are DO, 5, I, =, 1.25.
(i) Lexical phase can detect errors where the characters Example 2:  An example from C program
remaining in the input ‘do not form any token’.   for (int i = 1; i < = 10; i + +)
(ii) Errors of the type, ‘violation of syntax’ of the language   Here tokens are for, (, int, i, =, 1,;, i, < =, 10,;,
are detected by syntax analysis. i, ++,)
(iii) Semantic phase tries to detect constructs that have the   Number of tokens = 13
right syntactic structure but no meaning.
Example:  adding two array names etc. LEX compiler
Lexical analyzer divides the source code into tokens. To
Lexical Analysis implement lexical analyzer we have two techniques namely
Lexical Analysis is the first phase in compiler design. The hand code and the other one is LEX tool.
main task of the lexical analyzer is to read the input char- LEX is an automated tool which specifies lexical ana-
acters of the source program, group them into lexemes, and lyzer, from the rules given by the regular expression.
produce as output a sequence of tokens for each lexeme These rules are also called as pattern recognizing rules.
6.6 | Unit 6  •  Compiler Design

Syntax Analysis The parser generator is used for construction of the com-
pilers front end.
This is the 2nd phase of the compiler, checks the syntax and
constructs the syntax/parse tree.
Input of parser is token and output is a parse/ syntax tree. Scope of declarations
Declaration scope refers to the certain program text portion,
Constructing parse tree in which rules are defined by the language.
Within the defined scope, entity can access legally to
Construction of derivation tree for a given input string by
declared entities.
using the production of grammar is called parse tree.
The scope of declaration contains immediate scope
Consider the grammar
always. Immediate scope is a region of declarative portion
S → E + E/E * E with enclosure of declaration immediately.
E → id Scope starts at the beginning of declaration and scope
The parse tree for the string continues till the end of declaration. Whereas in the over
ω = id + id * id is loadable declaration, the immediate scope will begin, when
S the callable entity profile was determined.
The visible part refers text portion of declaration, which
E + E is visible from outside.

E * E
Syntax Error Handling
id id id 1. Reports the presence of errors clearly and accurately.
2. Recovers from each error quickly.
ω = id + id * id
3. It should not slow down the processing of correct
programs.
Role of the parser
1. Construct a parse tree. Error Recovery Strategies
2. Error reporting and correcting (or) recovery. A parser
can be modeled by using CFG (Context Free Grammar)
recognized by using pushdown automata/table driven Panic Phrase level Error Global
parser. mode productions correction
3. CFG will only check the correctness of sentence with
Panic mode On discovering an error, the parser discards
respect to syntax not the meaning.
input symbols one at a time until one of the synchronizing
tokens is found.
Source Token
program Lexical Parse Phrase level  A parser may perform local correction on the
Parser
analyzer tree
Get next remaining input. It may replace the prefix of the remaining
token Syntax input.
Lexical
errors
errors Error productions Parser can generate appropriate error
messages to indicate the erroneous construct that has been
How to construct a parse tree? recognized in the input.
Parse tree’s can be constructed in two ways. Global corrections There are algorithms for choosing a
(i) Top-down parser: It builds parse trees from the top minimal sequence of changes to obtain a globally least cost
(root) to the bottom (leaves). correction.
(ii) Bottom-up parser: It starts from the leaves and works
up to the root. Context Free Grammars
In both cases, the input to the parser is scanned from left to and Ambiguity
right, one symbol at a time.
A grammar is a set of rules or productions which generates
a collection of finite/infinite strings.
Parser generator It is a 4-tuple defined as G = (V, T, P, S)
Parser generator is a tool which creates a parser. Where
V = set of variables
Example:  compiler – compiler, YACC
T = set of terminals
The input of these parser generator is grammar we use and P = set of production rules
the output will be the parser code. S = start symbol
Chapter 1  •  Lexical Analysis and Parsing  |  6.7

Example:  S → (S)/e String


  S → (S)(1)
  S → e(2) String −
String
Here S is start symbol and the only variable.
(,), e is terminals. 9
String + String
(1) and (2) are production rules.
5 2

Sentential forms Figure 2   Rightmost derivation


*
s⇒ a, Where a may contain non-terminals, then we say •• Ambiguity is problematic because the meaning of the
that a is a sentential form of G. program can be incorrect.
Sentence:  A sentence is a sentential form with no •• Ambiguity can be handled in several ways
non-terminals. 1. Enforce associativity and precedence
2. Rewrite the grammar by eliminating left recursion and
Example:  –(id + id) is a sentence of the grammar (G1).
left factoring.
Derivations
Removal of ambiguity
Left most derivations Right most derivations
The grammar is said to be ambiguous if there exists more
E ⇒ −E ⇒ −(E ) E ⇒ −E ⇒ −(E ) than one derivation tree for the given input string.
⇒ −(E + E ) ⇒ −(E + E ) The ambiguity of grammar is undecidable; ambiguity of
⇒ −(id + E) ⇒ −(E + id)
a grammar can be eliminated by rewriting the grammar.
⇒ −(id + id) ⇒ −(id + id)
Example:
Right most derivations are also known as canonical E → E + E/id} → ambiguous grammar
derivations. E → E + T/T rewritten grammar
E T → id (unambiguous grammar)

− E Left recursion
( E )
Left recursion can take the parser into infinite loop so we
need to remove left recursion.
E + E
Elimination of left recursion
id id
A → Aa/b is a left recursive.
It can be replaced by a non-recursive grammar:
Ambiguity A → bA′
A′ → aA′/e
A grammar that produces more than one parse tree for some
sentence is said to be ambiguous. In general
Or A → Aa1/Aa2/…/Aam/b1/b2/…/bn
A grammar that produces more than one left most or more We can replace A productions by
than one right most derivations is ambiguous. A → b1 A′/b2 A′/–bn A′
A′ → a1 A′/a2 A′/–am A′
For example consider the following grammar:
String → String + String/String – String /0/1/2/…/9 Example 3:  Eliminate left recursion from
  E → E + T/T
9 – 5 + 2 has two parse trees as shown below   T → T * F/F
String   F → (E)/id
Solution  E → E + T/T it is in the form
A → Aa/b
String String
+ So, we can write it as E → TE′
 E′ → +TE′/e
2
String String
Similarly other productions are written as

T → FT ′
9 5  T1 → × FT ′/∈
Figure 1  Leftmost derivation F → (E)/id
6.8 | Unit 6  •  Compiler Design

Example 4  Eliminate left recursion from the grammar Let the string w = cad is to generate:
S → (L)/a S
L → L, S/b
c A d
Solution:  S → (L)/a
a b
L → bL′
L′ → SL′/∈ The string generated from the above parse tree is cabd.
but, w = cad, the third symbol is not matched.
So, report error and go back to A.
Left factoring Now consider the other alternative for production A.
A grammar with common prefixes is called non-determin-
S
istic grammar. To make it deterministic we need to remove
common prefixes. This process is called as Left Factoring. c A d
The grammar: A → ab1/ab2 can be transformed into a
 A → a A′
 A′ → b1/b2 String generated ‘cad’ and w = cad. Now, it is successful.
In this we have used back tracking. It is costly and time
Example 5: What is the resultant grammar after left consuming approach. Thus an outdated one.
factoring the following grammar?
S → iEtS/iEtSeS/a Predictive Parsers
E→b By eliminating left recursion and by left factoring the gram-
mar, we can have parse tree without backtracking. To con-
Solution:  S → iEtSS ′/a struct a predictive parser, we must know,
S ′ → eS/∈
1. Current input symbol
E→b 2. Non-terminal which is to be expanded
A procedure is associated with each non-terminal of the
grammar.
Types of Parsing
Parsers Recursive descent parsing
In recursive descent parsing, we execute a set of recursive
procedures to process the input.
Topdown parsers Bottom up parsers
(predictive parser)
The sequence of procedures called implicitly, defines a
parse tree for the input.
Recursive descent Non-recursive Operator (LR parsers)
parsing descent precedence Non-recursive predictive parsing
parsing parsing
(table driven parsing)
SLR CLR LALR •• It maintains a stack explicitly, rather than implicitly via
recursive calls.
•• A table driven predictive parser has
→ An input buffer
Topdown Parsing → A stack
A parse tree is constructed for the input starting from the → A parsing table
root and creating the nodes of the parse tree in preorder. It → Output stream
simulates the left most derivation.

Input a + b $
Backtracking Parsing
If we make a sequence of erroneous expansions and sub-
x
sequently discover a mismatch we undo the effects and roll Predictive parsing
y Output
program
back the input pointer.
z
This method is also known as brute force parsing.
$
Example: S → cAd Parsing table
M
 A → ab/a
Chapter 1  •  Lexical Analysis and Parsing  |  6.9

Constructing a parsing table By applying these rules to the above grammar, we will get
To construct a parsing table, we have to learn about two the following parsing table.
functions:
Input Symbol
1. FIRST ( )
2. FOLLOW ( ) Non-terminal id + * ( ) $
E E → TE ′ E → TE ′
FIRST(X)  To compute FIRST(X) for all grammar symbols
E ′ E ′ → + TE ′ E ′ → e E ′ → e
X, apply the following rules until no more terminals or e can
be added to any FIRST set. T T → FT ′ T → FT ′
T ′ T ′ → e T ′ → * FT ′ T ′ → e T ′ → e
1. If X is a terminal, then FIRST(X) is {X}.
2. If X → e is a production, then add e to FIRST(X). F F → id F → (E)

3. If X is non-terminal and X → Y1Y2 – Yk is a production,


then place ‘a’ in FIRST(X) if for some i, a is an FIRST The parser is controlled by a program. The program con-
(Yi) and ∈ is in all of FIRST(Y1), …, FIRST(Yi–1); that sider x, the symbol on top of the stack and ‘a’ the current
*
is, Y1, …, Yi – 1 ⇒ ∈. If ∈ is in FIRST (Yj) for all j = 1, input symbol.
2, …, k, then add ∈ to FIRST(X). For example, every-
thing in FIRST (Y1) is surely in FIRST(X). If Y1 does 1. If x = a = $, the parser halts and announces successful
not derive ∈, then add nothing more to FIRST(X), but if completion of parsing.
* 2. If x = a ≠ $, the parser pops x off the stack and advances
Y1 ⇒ ∈, then add FIRST (Y2) and so on.
the input pointer to the next input symbol.
FOLLOW (A): To compute FOLLOW (A) for all non- 3. If x is a non-terminal, the program consults entry M[x,
terminals A, apply the following rules until nothing can be a] of the parsing table M. This entry will be either an
added to any FOLLOW set. x-production of the grammar or an error entry. If M[x,
1. Place $ in FOLLOW(S), where S is the start symbol a] = {x → UVW}, the parser replaces x on top of the
and $ is input right end marker. stack by WVU with U on the top.
2. If there is a production A → aBb, then everything in If M[x, a] = error, the parser calls an error recovery routine.
FIRST (b) except e is placed in FOLLOW (B). For example, consider the moves made by predictive
3. If there is a production A → aB or a production parser on input id + id * id, which are shown below:
A → aBb, where FIRST (b) contains e, then every-
thing in FOLLOW (A) is in FOLLOW (B).
Matched Stack Input Action
Example:  Consider the grammar E$ id+id*id$
E → TE′
TE′$ id+id*id$ Output E → TE′
E′ → +TE′/e
T → FT ′ FT′E′$ id+id*id$ Output T → FT′
T ′ → *FT ′/e idT′E′$ id+id*id$ Output F → id
F → (E)/id. Then id +id*id$ Match id
T′E′$
FIRST (E) = FIRST (T ) = FIRST (F ) = {(, id}
id E′$ +id*id$ Output T′→ e
FIRST (E ′) = {+, e}
FIRST (T ′) = {*, e} id +TE′$ +id*id$ Output E′ → +TE′
FOLLOW (E) = FOLLOW (E′) = {), $} id+ TE′$ id*id$ Match+
FOLLOW (T) = FOLLOW (T ′) = {+,), $} id+ FT′E′$ id*id$ Output T → FT′
FOLLOW (F) = {*, +,), $}
id+ idT′E′$ id*id$ Output F → id
Steps for the construction of predictive id+id T′E′$ *id$ Match id
parsing table id+id *FT′E′$ *id$ Output T′ → *FT′
1. For each production A → a of the grammar, do steps 2 id+id* FT′E′$ id$ Match*
and 3.
id+id* idT′E′$ id$ OutputF → id
2. For each terminal a in FIRST (a), add A → a to M [A, a]
3. If e is in FIRST (a), add A → a to M [A, b] for each id+id*id T′E′$ $ Match id
terminal b in FOLLOW (A). If e is in FIRST (a) and $ id+id*id E′$ $ Output T′ → e
is in FOLLOW (A), add A → a to M [A, $]
id+id*id $ $ Output E′ → e
4. Make each undefined entry of M be error.
6.10 | Unit 6  •  Compiler Design

Bottom up Parsing Rightmost derivation


•• This parsing constructs the parse tree for an input string S ⇒ aABe ⇒ aAde ⇒ aAbcde ⇒ abbcde
beginning at the leaves and working up towards the root. For bottom up parsing, we are using right most derivation
•• General style of bottom-up parsing is shift-reduce parsing. in reverse.

Shift–Reduce Parsing Handle of a string Substring that matches the RHS of


some production and whose reduction to the non-terminal
Reduce a string to the start symbol of the grammar. It simu-
on the LHS is a step along the reverse of some rightmost
lates the reverse of right most derivation.
derivation.
In every step a particular substring is matched (in left
right fashion) to the right side of some production and then rm
S ⇒ α Ar ⇒ αβ r
it is substituted by the non-terminal in the left hand side of ∗

the production.
Right sentential forms of a unambiguous grammar have
For example consider the grammar one unique handle.
S → aABe
A → Abc/b Example:  For grammar, S → aABe
A → Abc/b
B→d
B→d
In bottomup parsing the string ‘abbcde’ is verified as S ⇒ aABe ⇒ aAde ⇒ aAbcde ⇒ abbcde
abbcde
Note:  Handles are underlined.
aAbcde
aAde →  reverse order Handle pruning  The process of discovering a handle and
aABe reducing it to the appropriate left hand side is called han-
S dle pruning. Handle pruning forms the basis for a bottomup
Stack implementation of shift–reduce parser parsing.
The shift reduce parser consists of input buffer, Stack and To construct the rightmost derivation:
parse table. S = r0 ⇒ r1 ⇒ r2 ____ ⇒ rn = w
Input buffer consists of strings, with each cell containing
only one input symbol. Apply the following simple algorithm:
Stack contains the grammar symbols, the grammar sym- For i ← n to 1
bols are inserted using shift operation and they are reduced Find the handle Ai → Bi in ri
using reduce operation after obtaining handle from the col- Replace Bi with Ai to generate ri–1
lection of buffer symbols. Consider the cut of a parse tree of a certain right sentential
Parse table consists of 2 parts goto and action, which are form:
constructed using terminal, non-terminals and compiler items. S
Let us illustrate the above stack implementation.
→ Let the grammar be A
 S → AA
  A → aA a b w
   A → b
Let the input string ‘w’ be abab$ Here A → b is a handle for abw.
w = abab$
Shift reduce parsing with a stack  There are 2 problems
Stack Input String Action with this technique:
$ abab$ Shift
(i) To locate the handle
$a bab$ Shift
(ii) Decide which production to use
$ab ab$ Reduce (A → b)
$aA ab$ Reduce (A → aA)
General construction using a stack
$A ab$ Shift
$Aa b$ Shift 1. ‘Shift’ input symbols onto the stack until a handle is
$Aab $ Reduce (A → b) found on top of it.
$AaA $ Reduce (A → aA)
2. ‘Reduce’ the handle to the corresponding non-terminal.
3. ‘Accept’ when the input is consumed and only the start
$AA $ Reduce (S → AA)
$S $ Accept symbol is on the stack.
4. Errors – call an error reporting/recovery routine.
Chapter 1  •  Lexical Analysis and Parsing  |  6.11

Viable prefixes The set of prefixes of a right sentential Let the operator precedence table for this grammar is:
form that can appear on the stack of a shift reduce parser
id + × $
are called viable prefixes.
id ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
Conflicts × ⋖ ⋗ ⋗ ⋗
Conflicts
$ ⋖ ⋖ ⋖ accept

Shift/reduce Reduce/reduce 1. Scan the string from left until ⋗ is encountered


conflict conflict 2. Then scan backwards (to left) over any = until ⋖ is
encountered.
Shift/reduce conflict 3. The handle contains everything to the left of the first ⋗
Example:  stmt → if expr then stmt | if expr then stmt else and to the right of the ⋖ is encountered.
stmt | any other statement
After inserting precedence relation is
If exp then stmt is on the stack, in this case we can’t tell
$id + id * id $ is
whether it is a handle. i.e., ‘shift/reduce’ conflict.
$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $
Reduce/reduce conflict Precedence functions  Instead of storing the entire table of
Example:  S → aA/bB precedence relations table, we can encode it by precedence
A→c functions f and g, which map terminal symbols to integers:
 B → c 1. f(a) ⋖ f(b) whenever a ⋖ b
 W = ac it gives reduce/reduce conflict. 2. f(a) ⋗ f(b) whenever a ≗ b
3. f(a) > f(b) whenever a ⋗ b
Operator Precedence Grammar
In operator grammar, no production rule can have: Finding precedence functions for a table
•• e at the right side. 1. Create symbols f(a) and g(a) for each ‘a’ that is a ter-
•• two adjacent non-terminals at the right side. minal or $.
Example 1:  E → E + E /E – E/ id is operator grammar. 2. Partition the created symbols into as many groups as
possible in such away that a = b then f (a) and g (b) are
Example 2:  E → AB in the same group
 A → a not operator grammar 3. Create a directed graph
B→b If a < b then place an edge from g(b) to f(a)
If a > b then place an edge from f(a) to g(b)
Example 3:  E → E0E/id
4. If the graph constructed has a cycle then no precedence
not operator function exists.
grammar If there are no cycles, let f(a) be the length of the long-
est path being at the group of f(a).
Precedence relation  If
Let g(a) be the length of the longest path from the
a < b then b has higher precedence than a
group of g(a).
a = b then b has same precedence as a
a > b then b has lower precedence than a
Disadvantages of operator
Common ways for determining the precedence relation precedence parsing
between pair of terminals:
•• It can not handle unary minus.
1. Traditional notations of associativity and precedence. •• Difficult to decide which language is recognized by
Example:  × has higher precedence than + × .> + (or) + <. × grammar.
2. First construct an unambiguous grammar for the lan- Advantages
guage which reflects correct associativity and prec- 1. Simple
edence in its parse tree. 2. Powerful enough for expressions in programming
language.
Operator precedence relations from
Error cases
associativity and precedence 1. No relation holds between the terminal on the top of
Let us use $ to mark end of each string. Define $ <. b and b stack and the next input symbol.
⋗ $ for all terminals b. Consider the grammar is: 2. A handle is found, but there is no production with this
E → E + E/E × E/id handle as the right side.
6.12 | Unit 6  •  Compiler Design

Error recovery a 1 a 2 ... a i ... a n $ Input


1. Each empty entry is filled with a pointer to an error Stack
routine. Sm
2. Based on the handle tries to recover from the situation. xm Predictive parsing
Output
: program
To recover, we must modify (insert/change) :
:
1. Stack or S1 Action table Goto table
2. Input or x1 terminals and $ non-terminals
3. Both So s
t 4 State
We must be careful that we don’t get into an infinite loop. a different number
t actions
e
s
LR Parsers Stack:  To store the string of the form,
•• In LR (K), L stands for Left to Right Scanning, R stands So x1 S1 … xmSm where
for Right most derivation, K stands for number of look
ahead symbols. Sm: state
•• LR parsers are table-driven, much like the non-recursive xm: grammar symbol
LL parsers. A grammar which is used in construction of Each state symbol summarizes the information contained in
LR parser is LR grammar. For a grammar to be LR it is the stack below it.
sufficient that a left-to-right shift-reduce parser be able
to recognize handles of right-sentential forms when they Parsing table:  Parsing table consists of two parts:
appear on the top of the stack. 1. Action part
•• The Time complexity for such parsers is O (n3) 2. Goto part
•• LR parsers are faster than LL (1) parser.
ACTION Part:
•• LR parsing is attractive because
„„
The most general non-backtracking shift reduce parser. Let, Sm → top of the stack
„„
The class of grammars that can be passed using LR     ai → current symbol
methods is a proper superset of predictive parsers. LL Then action [Sm, ai] which can have one of four values:
(1) grammars ⊂ LR (1) grammars. 1. Shift S, where S is a state
„„
LR parser can detect a syntactic error in the left to right 2. Reduce by a grammar production A → b
scan of the input. 3. Accept
•• LR parsers can be implemented in 3 ways: 4. Error
1. Simple LR (SLR): The easiest to implement but the GOTO Part:
least powerful of the three. If goto (S, A) = X where S → state, A → non-terminal, then
2. Canonical LR (CLR): most powerful and most GOTO maps state S and non-terminal A to state X.
expensive.
3. Look ahead LR (LALR): Intermediate between the
Configuration
remaining two. It works on most programming lan-
guage grammars. (So x1S1 x2S2 – xmSm, aiai+1 – an$)
The next move of the parser is based on action [Sm, ai]
The configurations are as follows.
Disadvantages of LR parser
1. If action [Sm, ai] = shift S
1. Detecting a handle is an overhead, parse generator is
used. (So x1S1 x2S2--- xmSm, aiai+1 --- an$)
2. The main problem is finding the handle on the stack 2. If action [Sm, ai] = reduce A → b then
and it was replaced with the non-terminal with the left (So x1S1 x2S2--- xm–rSm–r, AS, aiai+1 --- an$)
hand side of the production.
Where S = goto [Sm–r, A]
3. If action [Sm, ai] = accept, parsing is complete.
The LR parsing algorithm 4. If action [Sm, ai] = error, it calls an error recovery
•• It consists of an input, an output, a stack, a driver program routine.
and a parsing table that has two parts (action and goto). Example:  Parsing table for the following grammar is
•• The driver/parser program is same for all these LR pars- shown below:
ers, only the parsing table changes from parser to another. 1. E → E + T 2.  E→T
Chapter 1  •  Lexical Analysis and Parsing  |  6.13

3. T → T * F 4. 
T→F A → B.CD means we have seen an input string derivable
5. F → (E) 6.  F → id from B and hope to see a string derivable from CD.
The LR (0) items are constructed as a DFA from gram-
Action Goto mar to recognize viable prefixes.
State id + × ( ) $ E T F The items can be viewed as the states of NFA.
0 S5 S4 1 2 3 The LR (0) item (or) canonical LR (0) collection, pro-
1 S6 acc
vides the basis for constructing SLR parser.
2 r2 S7 r2 r2 To construct LR (0) items, define
3 r4 r4 r4 r4 (a) An augmented grammar
4 S5 S4 8 2 3 (b) closure and goto
5 r6 r6 r6 r6
6 S5 S4 9 3 Augmented grammar (G′)  If G is a grammar with start
7 S5 S4 10 symbol S, G′ the augmented grammar for G, with new start
8 S6 S1 symbol S ′ and production S′ → S.
9 r1 S7 r1 r1 Purpose of G′ is to indicate when to stop parsing and
10 r3 r3 r3 r3 announce acceptance of the input.
11 r5 r5 r5 r5
Closure operation  Closure (I) includes
Moves of LR parser on input string id*id+id is shown below: 1. Intially, every item in I is added to closure (I)
2. If A → a.Bb is in closure (I) and b → g is a production
then add B → .g to I.
Stack Input Action
0 id * id + id$ Shift 5
reduce 6 means reduce with
Goto operation
0id 5 * id + id$ 6th production F → id and Goto (I, x) is defined to be the closure of the set of all items
goto [0, F ] = 3 [A → aX.b] such that [A → a.Xb] is in I.
reduce 4 i.e T → F
0F 3 * id + id$ Items
goto [0, T ] = 2
0T 2 * id + id$ Shift 7
0T2 * 7 id + id$ Shift 5 Kernel items: S′ → .S Non-kernel items:
reduce 6 i.e F → id
and all items whose Which have their
0T2 * 7 id 5 + id$ dots are not at the dots at the left end.
goto [7, F ] = 10
left end
0T2 * 7 F 10 + id$ reduce 3 i.e T → T *F
0T 2 + id$ goto [0, T ] = 2 Construction of sets of Items
0E 1 + id$ reduce 2 i.e E → T & goto [0, E ] = 1 Procedure items (G′)
Begin
0E1 + 6 id$ Shift 6
C: = closure ({[S′ → .S]});
0E1 + 6 id 5 $ Shift 5 repeat
0E1 + 6F 3 $ reduce 6 & goto [6, F ] = 3 For each set of items I in C and each grammar symbol x
0E1 + 6T 9 $ reduce 4 & goto [6, T ] = 9 Such that goto (I, x) is not empty and not in C do add goto
(I, x) to C;
0E1 $ reduce 1 & goto [0, E ] = 1
Until no more sets of items can be added to C, end;
0E1 $ accept
Example:  LR (0) items for the grammar
Constructing SLR parsing table E′ → E
LR (0) item: LR (0) item of a grammar G is a production of E → E + T/T
G with a dot at some position of the right side of production. T → T * F/F
F → (E)/id
Example:  A → BCD
Possible LR (0) items are is given below:
I0: E′ → .E
A → .BCD
A → B.CD E → .E + T
A → BC.D E → .T
A → BCD. T → .T * F
6.14 | Unit 6  •  Compiler Design

T → .F For viable prefixes construct the DFA as follows:


F → .(E)
E + T
F → .id I0 I1 I6 I9 * to I 7
F to I 3
( to I 4
I1: got (I0, E)
id to I 5
E′ → E. T F
I2 * I7 I 10
E → E. + T ( to I 4
id to I 5
I2: goto (I0, T) F
I3 to I 7
E → T.
(
T → T. * F ( E )
I4 I8 I 11
I3: goto (I0, F) + to I 6
T to I 2
T → F. id F to I 3
id
I5
I4: goto (I0, ( )
F → (.E)
E → .E + T SLR parsing table construction
E → .T 1. Construct the canonical collection of sets of LR (0)

E → .T * F items for G′.


T → .F 2. Create the parsing action table as follows:
F → .(E) (a) If a is a terminal and [A → a.ab] is in Ii, goto
F → .id (Ii, a) = Ij then action (i, a) to shift j. Here ‘a’ must
be a terminal.
I5: goto (I0, id) (b) If [A → a.] is in Ii, then set action [i, a] to ‘reduce
F → id. A → a’ for all a in FOLLOW (A);
(c) If [S′ → S.] is in Ii then set action [i, $] to ‘accept’.
I6: got (I1, +) 3. Create the parsing goto table for all non-terminals A, if
E → E+ .T goto (Ii, A) = Ij then goto [i, A] = j.
T → .T * F 4. All entries not defined by steps 2 and 3 are made errors.
T → .F 5. Initial state of the parser contains S′ → S.
  The parsing table constructed using the above algo-
F → .(E) rithm is known as SLR (1) table for G.
F → .id
Note:  Every SLR (1) grammar is unambiguous, but every
I7: goto (I2, *) unambiguous grammar is not a SLR grammar.
T → T* .F Example 6:  Construct SLR parsing table for the following
F → .(E) grammar:
F → .id 1.  S → L = R
2.  S → R
I8: goto (I4, E)
3.  L → * R
F → (E.)
4.  L → id
I9: goto (I6, T) 5.  R → L
E → E+ T. Solution:  For the construction of SLR parsing table, add
T → T.* F S′ → S production.
S′ → S
I10: goto (I7, F)
  S → L = R
T → T* F.
  S → R
I11: goto (I8,))   L → *R
F → (E).   L → id
 R → L
Chapter 1  •  Lexical Analysis and Parsing  |  6.15

LR (0) items will be Action Goto


I0: S′ → .S States = * id $ S L R
S → .L = R 0 S4 S5 1 2 3
S → .R 1 acc
L → .*R 2 S6,r5 r5
L → .id 3
R → .L 4 S4 S5 8 7
I1: goto (I0, S)
5
S′ → S.
6 S4 S5 8 9
I2: goto (I0, L)
7
 S → L. = R
8
R → L.
9
I3: got (I0, R)
S → R.
FOLLOW (S) = {$}
I4: goto (I0, *)
FOLLOW (L) = {=}
L → *.R
FOLLOW (R) = {$, =}
R → .L For action [2, =] = S6 and r5
L → .*R ∴ Here we are getting shift – reduce conflict, so it is not
L → .id SLR (1).
I5: goto(I0, id)
L → id. Canonical LR Parsing (CLR)
I6: goto(I2, =) •• To avoid some of invalid reductions, the states need to
 S → L = .R carry more information.
R → .L •• Extra information input into a state by including a terminal
L → .*R symbol as a second component of an item.
L → .id •• The general form of an item
I7: goto(I4, R) [A → a.b, a]
Where A → ab is a production.
L → *R.
a is terminal/right end marker ($). We will call it as LR
I8: goto(I4, L) (1) item.
R → L.
I9: goto(I6, R) LR (1) item
S → L = R.
It is a combination of LR (0) items along with look ahead of
the item. Here 1 refers to look ahead of the item.
The DFA of LR(0) items will be
Construction of the sets of LR (1) items  Function closure (I):
S Begin
I0 I1
Repeat
*
For each item [A → a.Bb, a] in I,
* I4
id
I5 Each production B → .g in G′,
R And each terminal b in FIRST (b a)
I7
Such that [B → .g, b] is not in I do
L
I8 Add [B → .g, b] to I;
L = R End;
I2 I6 I9
L
Until no more items can be added to I;
I8
* Example 7:  Construct CLR parsing table for the following
I4
id
I5
grammar:
S′ → S
R
I3   S → CC
id  C → cC/d
I5
6.16 | Unit 6  •  Compiler Design

Solution:  The initial set of items is Consider the string derivation ‘dcd’:
I0: S′ → .S, $ S ⇒ CC ⇒ CcC ⇒ Ccd ⇒ dcd
S → .CC, $
A → a.Bb, a Stack Input Action
Here A = S, a = ∈, B = C, b = C and a = $ 0 dcd $ shift 4
First (ba) is first (C$) = first (C) = {c, d} 0d4 Cd $ reduce 3 i.e. C → d
So, add items [C → .cC, c]
0C 2 Cd $ shift 6
 [C → .cC, d]
0C 2C 6 D $ shift 7
∴ Our first set I0: S′ → .S, $
0C2C 6d 7 $ reduce C → d
S → .CC, $
0C 2C 6C 9 $ reduce C → cC
C → .coca, c/d
C → .d, c/d. 0C 2C 5 $ reduce S → CC
0S1 $
I1: goto (I0, X) if X = S
S′ → S., $
Example 8:  Construct CLR parsing table for the grammar:
I2 : goto (I0, C)
S→L=R
  S → C.C, $
S→R
C → .cC, $
L → *R
C → .d, $
L → id
I3: goto (I0, c)
R→L
C → c.C, c/d
C → .cC, c/d Solution:  The canonical set of items is
C → .d c/d I0: S′ → .S, $
I4: goto (I0, d) S → .L = R, $
C → d., c/d S → .R, $
L → .* R, = [first (= R$) = {=}]
I5: goto (I2, C) L → .id, =
S → CC., $ R → .L, $
I6: goto (I2, c) I1: got (I0, S)
C → c.C; $ S′ → S., $
C → ..cC, $
C → ..d, $ I2: goto (I0, L)
S → L. = R, $
I7: goto (I2, d)
R → L., $
C → d. $
I3: goto (I0, R)
I8: goto (I3, C)
S → R., $
C → cC., c/d
I9: goto (I6, C) I4: got (I0, *)
C → cC., $ L → *. R, =
R → .L, =
CLR table is: L → .* R, =
Action Goto
L → .id, =
I5: goto (I0, id)
States c 1 $ S C
L → id.,=
I0 S3 S4 1 2
I6: goto (I7, L)
I1 acc
R → L., $
I2 S6 S7 5
I3 S3 S4 8 I7: goto (I2, =)
I4 R3 r3  S → L = .R,
I5 r1 R → .L, $
I6 S6 S7 9
 L → .*R, $
 L → .id, $
I7 r3
I8 R2 r2 I8: goto (I4, R)
I9 r2 L → *R., =
Chapter 1  •  Lexical Analysis and Parsing  |  6.17

I9: goto (I4, L) Stack Input


R → L., = 0 Id = id $

I10: got (I7, R) 0id 5 = id $


S → L = R., $ 0L2 = id $

I11: goto (I7, *) 0L2 = 7 id $


L → *.R, $ 0L2 = 7! d12 $
R → .L, $
0L2 = 7L6 $
L → .*R, $
0L 2= 7R10 $
L → .id, $
0S1 (accept) $
I12: goto (I7, id)
L → id. , $
Every SLR (1) grammar is LR (1) grammar.
I13: goto (I11, R) CLR (1) will have ‘more number of states’ than SLR Parser.
L → *R., $
LALR Parsing Table
S
I0 I1 •• The tables obtained by it are considerably smaller than
the canonical LR table.
L = L
I2 I7 I6 •• LALR stands for Lookahead LR.
R
I 10 •• The number of states in SLR and LALR parsing tables for
* R I 13 a grammar G are equal.
I 11
R * L
to I 6 •• But LALR parsers recognize more grammars than SLR.
I3
id
to I 12 •• YACC creates a LALR parser for the given grammar.
id
*
I 12 •• YACC stands for ‘Yet another Compiler’.
* R •• An easy, but space-consuming LALR table construction
I4 I8
L is explained below:
I9
id I5
1. Construct C = {I0, I1, –In}, the collection of sets of LR
(1) items.
2. Find all sets having the common core; replace these
We have to construct CLR parsing table based on the above sets by their union
diagram. 3. Let C′ = {JO, J1 --- Jm} be the resulting sets of LR (1)
In this, we are going to have 13 states items. If there is a parsing action conflict then the
The shift –reduce conflict in the SLR parser is reduced grammar is not a LALR (1).
here.
4. Let k be the union of all sets of items having the same
core. Then goto (J, X) = k
States id * = $ S L R •• If there are no parsing action conflicts then the grammar
0 S5 S4 1 2 3 is said to LALR (1) grammar.
1 acc •• The collection of items constructed is called LALR (1)
2 S7 r5 collection.
3 r2
Example 9: Construct LALR parsing table for the
4 S5 S4 9 8 following grammar:
5 r4
S′ → S
6 r5   S → CC
7 s12 s11 6 10  C → cC/d
8 r3
Solution:  We already got LR (1) items and CLR parsing
9 r5
table for this grammar.
10 r1
After merging I3 and I6 are replaced by I36.
11 S12 S11 13
I36: C → c.C, c/d/$
12 r4
C → .cC, c/d/$
13 r3
C → .d, c/d/$
6.18 | Unit 6  •  Compiler Design

I47: By merging I4 and I7 I5: goto (I2, B)


C → d. c/d/$ S → aB.e, c
I89: I8 and I9 are replaced by I89 I6: goto (I2, c)
C → cC., c/d/$ A → c., d
The LALR parsing table for this grammar is given below: B → c., e
I7: goto (I3, c)
Action goto
A → c., e
State c d $ S C
B → c., d
0 S36 S47 1 2
I8: goto (I4, d)
1 acc S → aAd., c
2 S36 S47 5 I9: goto (I5, e)
36 S36 S47 89 S → aBe., c
47 r3 R3 r3 If we union I6 and I7
A → c., d/e
5 r1
B → c., d/e
89 r2 r2 r2
It generates reduce/reduce conflict.
Example:  Consider the grammar: Notes:
S′ → S 1. The merging of states with common cores can never
  S → aAd produce a shift/reduce conflict, because shift action
  S → bBd depends only on the core, not on the lookahead.
  S → aBe 2. SLR and LALR tables for a grammar always have the
  S → bAe same number of states (several hundreds) whereas
  A → c CLR have thousands of states for the same grammar.
  B → c
Comparison of parsing methods
Which generates strings acd, bcd, ace and bce
LR (1) items are Goto and Grammar it
Method Item Closures Applies to
I0: S′ → .S, $
SLR (1) LR(0) item Different from SLR (1) ⊂ LR(1)
   S → .aAd, $ LR(1)
  S → .bBd, $ LR (1) LR(1) item LR(1) – Largest
  S → .aBe, $ class of LR
grammars
 S → .bAe, $
LALR(1) LR(1) item Same as LR(1) LALR(1) ⊂ LR(1)
I1: goto (I0, S)
S′ → S., $
CLR(1)
I2: goto (I0, a) LALR(1)
SLR(1)
S → a.Ad, c
LR(0)
S → a.Be, c
A → .c,d
LL(1)
B → .c,e
I3: goto (I0, b)
S → b.Bd, c
S → b.Ae, c Every LR (0) is SLR (1) but vice versa is not true.
A → .c, e
B → .c, e Difference between SLR, LALR
I4: goto (I2, A)
and CLR parsers
S → aA.d, c Differences among SLR, LALR and CLR are discussed
below in terms of size, efficiency, time and space.
Chapter 1  •  Lexical Analysis and Parsing  |  6.19

Table 1  Comparison of parsing methods


SI. No. Factors SLR Parser LALR Parser CLR Parser
1 Size Smaller Smaller Larger
2. Method It is based on FOLLOW This method is applicable This is most powerful than
function to wider class than SLR SLR and LALR.
3. Syntactic features Less exposure compared Most of them are Less
to other LR parsers expressed
4. Error detection Not immediate Not immediate Immediate
5. Time and space Less time and space More time and space More time and space
complexity complexity complexity

Exercises
Practice Problems 1 4. If an LL (1) parsing table is constructed for the above

grammar, the parsing table entry for [S → [ ] is
Directions for questions 1 to 15:  Select the correct alterna-
(A) S → T; S (B) S→∈
tive from the given choices.
(C) T → UR (D) U → [S]
1. Consider the grammar
S→a Common data for questions 5 to 7: Consider the aug-
S → ab mented grammar
The given grammar is: S→X
(A) LR (1) only X → (X)/a
(B) LL (1) only 5. If a DFA is constructed for the LR (1) items of the
(C) Both LR (1) and LL (1) above grammar, then the number states present in it
(D) LR (1) but not LL (1) are:
2. Which of the following is an unambiguous grammar, (A) 8 (B) 9
that is not LR (1)? (C) 7 (D) 10
(A) S → Uab | Vac 6. Given grammar is
U → d (A) Only LR (1)
V → d (B) Only LL (1)
(B) S → Uab/Vab/Vac (C) Both LR (1) and LL (1)
U → d (D) Neither LR (1) nor LL (1)
V → d 7. What is the number of shift-reduce steps for input (a)?
(C) S → AB (A) 15 (B) 14
A → a (C) 13 (D) 16
B → b 8. Consider the following two sets of LR (1) items of a
(D) S → Ab grammar:
A → a/c
X → c.X, c/d X → c.X, $
Common data for questions 3 and 4: Consider the grammar: X → .cX, c/d X → .cX, $
S → T; S/∈ X → d, c/d X → .d, $
T → UR Which of the following statements related to merging
U → x/y/[S] of the two sets in the corresponding LALR parser is/are
R → .T/∈ FALSE?
3. Which of the following are correct FIRST and 1.  Cannot be merged since look ahead are different.
FOLLOW sets for the above grammar? 2.  Can be merged but will result in S – R conflict.
(i) FIRST(S) = FIRST (T) = FIRST (U) = {x, y, [, e} 3.  Can be merged but will result in R – R conflict.
(ii) FIRST (R) = {,e} 4. Cannot be merged since goto on c will lead to two
(iii) FOLLOW (S) = {], $} different sets.
(iv) FOLLOW (T) = Follow (R) = {;} (A) 1 only (B) 2 only
(v) FOLLOW (U) = {. ;} (C) 1 and 4 only (D) 1, 2, 3 and 4
(A) (i) and (ii) only 9. Which of the following grammar rules violate the
(B) (ii), (iii), (iv) and (v) only requirements of an operator grammar?
(C) (ii), (iii) and (iv) only (i) A → BcC (ii) A → dBC
(D) All the five (iii) A → C/∈ (iv) A → cBdC
6.20 | Unit 6  •  Compiler Design

(A) (i) only (B) (i) and The entry/entries for [Z, d] is/are
(C) (ii) and (iii) only (D) (i) and (iv) only (A) Z→d
10. The FIRST and FOLLOW sets for the grammar: (B) Z → XYZ
(C) Both (A) and (B)
S → SS + /SS*/a
(D) X→Y
(A) First (S) = {a}
Follow (S) = {+, *, $} 13. The following grammar is
(B) First (S) = {+} S → AaAb/BbBa
Follow (S) = {+, *, $} A→e
(C) First (S) = {a} B→e
Follow (S) = {+, *} (A) LL (1) (B) Not LL (1)
(D) First (S) = {+, *} (C) Recursive (D) Ambiguous
Follow (S) = {+, *, $} 1 4. Compute the FIRST (P) for the below grammar:
11. A shift reduces parser carries out the actions specified P → AQRbe/mn/DE
within braces immediately after reducing with the cor- A → ab/e
responding rule of the grammar: Q → q1q2/e
S → xxW [print ‘1’] R → r1r2/e
S → y [print ‘2’] D→d
W → Sz [print ‘3’] E→e
What is the translation of ‘x x x x y z z’? (A) {m, a} (B) {m, a, q1, r1, b, d}
(A) 1231 (B) 1233 (C) {d, e} (D) {m, n, a, b, d, e, q1, r1}
(C) 2131 (D) 2321 15. After constructing the LR(1) parsing table for the aug-
12. After constructing the predictive parsing table for the mented grammar
following grammar: S′ → S
Z→d S → BB
Z → XYZ B → aB/c
Y → c/∈ What will be the action [I3, a]?
X→Y (A) Accept (B) S7
X→a (C) r2 (D) S5

Practice Problems 2 (C) + and? have same precedence


Directions for questions 1 to 19:  Select the correct alterna- (D) + has higher precedence than *
tive from the given choices. 4. The action of parsing the source program into the
1. Consider the grammar proper syntactic classes is known as
S → aSb (A) Lexical analysis
S → aS (B) Syntax analysis
S→e (C) Interpretation analysis
This grammar is ambiguous by generating which of the (D) Parsing
following string. 5. Which of the following is not a bottom up parser?
(A) aa (B) ∈ (A) LALR (B) Predictive parser
(C) aaa (D) aab (C) CLR (D) SLR
2. To convert the grammar E → E + T into LL grammar 6. A system program that combines separately compiled
(A) use left factor modules of a program into a form suitable for execu-
(B) CNF form tion is
(C) eliminate left recursion (A) Assembler.
(D) Both (B) and (C) (B) Linking loader.
3. Given the following expressions of a grammar (C) Cross compiler.
E → E × F/F + E/F (D) None of these.
F → F? F/id 7. Resolution of externally defined symbols is performed
Which of the following is true? by a
(A) × has higher precedence than + (A) Linker (B) Loader.
(B) ? has higher precedence than × (C) Compiler. (D) Interpreter.
Chapter 1  •  Lexical Analysis and Parsing  |  6.21

8. LR parsers are attractive because 15. Which of the following statement is false?
(A) They can be constructed to recognize CFG cor- (A)  An unambiguous grammar has single leftmost
responding to almost all programming constructs. derivation.
(B) There is no need of backtracking. (B) An LL (1) parser is topdown.
(C) Both (A) and (B). (C) LALR is more powerful than SLR.
(D) None of these (D) An ambiguous grammar can never be LR (K) for
9. YACC builds up any k.
(A) SLR parsing table
16. Merging states with a common core may produce ___
(B) Canonical LR parsing table
conflicts in an LALR parser.
(C) LALR parsing table
(A) Reduce – reduce
(D) None of these
(B) Shift – reduce
10. Language which have many types, but the type of every
(C) Both (A) and (B)
name and expression must be calculated at compile
(D) None of these
time are
(A) Strongly typed languages 1 7. LL (K) grammar
(B) Weakly typed languages (A) Has to be CFG
(C) Loosely typed languages (B) Has to be unambiguous
(D) None of these (C) Cannot have left recursion
11. Consider the grammar shown below: (D) All of these
S → iEtSS′/a/b 18. The I0 state of the LR (0) items for the grammar
S′ → eS/e
S → AS/b
In the predictive parse table M, of this grammar, the A → SA/a.
entries M [S′, e] and M [S′, $] respectively are
(A) S′ → .S
(A) {S′ → eS} and {S′ → ∈} S → .As
(B) {S′ → eS} and { } S → .b
(C) {S′ → ∈} and {S′ → ∈} A → .SA
(D) {S′ → eS, S′ → e}} and {S′ → ∈} A → .a
12. Consider the grammar S → CC, C → cC/d. (B) S → .AS
The grammar is S → .b
(A) LL (1) A → .SA
(B) SLR (1) but not LL (1) A → .a
(C) LALR (1) but not SLR (1) (C) S → .AS
(D) LR (1) but not LALR (1) S → .b
13. Consider the grammar (D) S→A
E → E + n/E – n/n A → .SA
For a sentence n + n – n, the handles in the right senten- A → .a
tial form of the reduction are
1 9. In the predictive parsing table for the grammar:
(A) n, E + n and E + n – n
(B) n, E + n and E + E – n S → FR
(C) n, n + n and n + n – n R → ×S/e
(D) n, E + n and E – n F → id
14. A top down parser uses ___ derivation. What will be the entry for [S, id]?
(A) Left most derivation (A) S → FR
(B) Left most derivation in reverse (B) F → id
(C) Right most derivation (C) Both (A) and (B)
(D) Right most derivation in reverse (D) None of these
6.22 | Unit 6  •  Compiler Design

Previous Years’ Questions


1. Consider the grammar: S → AC|CB (D)
(C) S → AC|CB
S → (S) | a C → aC|b|∈ C → aCb|∈
Let the number of states in SLR (1), LR (1) and LALR (1) A → aA|∈ A → aA|a
parsers for the grammar be n1, n2 and n3 respectively. B → Bb|∈ B → Bb|b
The following relationship holds good:  [2005] 6. In the correct grammar above, what is the length of
(A) n1 < n2 < n3 (B) n1 = n3 < n2 the derivation (number of steps starting from S) to
(C) n1 = n2 = n3 (D) n1 ≥ n3 ≥ n2 generate the string albm with l ≠ m? [2006]
2. Consider the following grammar: (A) max (l, m) + 2
  S→S*E (B) l + m + 2
  S→E (C) l + m + 3
E→F+E (D) max (l, m) + 3
E→F 7. Which of the following problems is undecidable?
F → id  [2007]
Consider the following LR (0) items corresponding to (A) Membership problem for CFGs.
the grammar above. (B) Ambiguity problem for CFGs.
(i) S → S * .E (C) Finiteness problem for FSAs.
(ii) E → F. + E (D) Equivalence problem for FSAs.
(iii) E → F + .E
8. Which one of the following is a top-down parser?
Given the items above, which two of them will appear  [2007]
in the same set in the canonical sets-of items for the (A) Recursive descent parser.
grammar? [2006] (B) Operator precedence parser.
(A) (i) and (ii) (B) (ii) and (iii)
(C) An LR (k) parser.
(C) (i) and (iii) (D) None of the above (D) An LALR (k) parser.
3. Consider the following statements about the context- 9. Consider the grammar with non-terminals N = {S, C,
free grammar and S1}, terminals T = {a, b, i, t, e} with S as the start
G = {S → SS, S → ab, S → ba, S → ∈} symbol, and the following set of rules: [2007]
(i) G is ambiguous S → iCtSS1|a
(ii) G produces all strings with equal number of a’s S1 → eS|e
and b’s C→b
(iii) G can be accepted by a deterministic PDA. The grammar is NOT LL (1) because:
Which combination below expresses all the true state- (A) It is left recursive
ments about G?  [2006] (B) It is right recursive
(A) (i) only (B) (i) and (iii) only (C) It is ambiguous
(C) (ii) and (iii) only (D) (i), (ii) and (iii) (D) It is not context-free.
4. Consider the following grammar: 10. Consider the following two statements:
S → FR P: Every regular grammar is LL (1)
R → *S|e Q: Every regular set has a LR (1) grammar
F → id Which of the following is TRUE?[2007]
In the predictive parser table, M, of the grammar the (A) Both P and Q are true
(B) P is true and Q is false
entries M[S, id] and M[R, $] respectively. [2006]
(C) P is false and Q is true
(A) {S → FR} and {R → e}
(B) {S → FR} and { } (D) Both P and Q are false
(C) {S → FR} and {R → *S} Common data for questions 11 and 12: Consider the
(D) {F → id} and {R → e}
CFG with {S, A, B} as the non-terminal alphabet, {a, b}
5. Which one of the following grammars generates the as the terminal alphabet, S as the start symbol and the fol-
language L = {aibj|i ≠ j}? [2006] lowing set of production rules:
(A) S → AC|CB (B) S → aS|Sb|a|b  S → aB  S → bA
C → aCb|a|b B → b A → a
A → aA|∈ B → bS A → aS
B → Bb|∈ B → aBB    S → bAA
Chapter 1  •  Lexical Analysis and Parsing  |  6.23

11. Whichof the following strings is generated by the (A) Abstract syntax tree
grammar? [2007] (B) Symbol table
(A) aaaabb (B) aabbbb (C) Semantic stack
(C) aabbab (D) abbbba (D) Parse table
12. For the correct answer strings to Q.78, how many der- 18. The grammar S → aSa|bS|c is [2010]
ivation trees are there?  [2007] (A) LL (1) but not LR (1)
(A) 1 (B) 2 (B) LR (1) but not LR (1)
(C) 3 (D) 4 (C) Both LL (1) and LR (1)
(D) Neither LL (1) nor LR (1)
13. Which of the following describes a handle (as applica-
ble to LR-parsing) appropriately? [2008] 19. The lexical analysis for a modern computer language
(A) It is the position in a sentential form where the next such as Java needs the power of which one of the fol-
shift or reduce operation will occur. lowing machine models in a necessary and sufficient
(B) It is non-terminal whose production will be used sense? [2011]
for reduction in the next step. (A) Finite state automata
(C) It is a production that may be used for reduction (B) Deterministic pushdown automata
in a future step along with a position in the sen- (C) Non-deterministic pushdown automata
tential form where the next shift or reduce opera- (D) Turing machine
tion will occur. Common data for questions 20 and 21:  For the grammar
(D) It is the production p that will be used for reduc- below, a partial LL (1) parsing table is also presented
tion in the next step along with a position in the along with the grammar. Entries that need to be filled are
sentential form where the right hand side of the indicated as E1, E2, and E3. Is the empty string, $ indicates
production may be found. end of input, and, I separates alternate right hand side of
14. Which of the following statements are true? productions
(i) Every left-recursive grammar can be converted S → a A b B|b A a B| e
to a right-recursive grammar and vice-versa A→S
(ii) All e-productions can be removed from any con-
B→S
text-free grammar by suitable transformations
(iii) The language generated by a context-free gram- a b $
mar all of whose productions are of the form X S E1 E2 S→e
→ w or X → wY (where, w is a string of terminals
and Y is a non-terminal), is always regular A A→S A→S Error
(iv) The derivation trees of strings generated by a con- B B→S B→S E3
text-free grammar in Chomsky Normal Form are
always binary trees [2008] 20. The FIRST and FOLLOW sets for the non-terminals
(A) (i), (ii), (iii) and (iv) (B) (ii), (iii) and (iv) only A and B are [2012]
(C) (i), (iii) and (iv) only (D) (i), (ii) and (iv) only (A) FIRST (A) = {a, b, e} = FIRST (B)
15. An LALR (1) parser for a grammar G can have shift-

FOLLOW (A) = {a, b}
reduce (S–R) conflicts if and only if [2008] FOLLOW (B) = {a, b, $}
(A) The SLR (1) parser for G has S–R conflicts (B) FIRST (A) = {a, b, $}
(B) The LR (1) parser for G has S–R conflicts FIRST (B) = {a, b, e}
(C) The LR (0) parser for G has S–R conflicts FOLLOW (A) = {a, b}
(D) The LALR (1) parser for G has reduce-reduce FOLLOW (B) = {$}
conflicts (C) FIRST (A) = {a, b, e} = FIRST (B)
FOLLOW (A) = {a, b}
16. S → aSa| bSb | a | b;
FOLLOW (B) = ∅
The language generated by the above grammar over the
(D) FIRST (A) = {a, b} = FIRST (B)
alphabet {a, b} is the set of [2009]
FOLLOW (A) = {a, b}
(A) All palindromes.
(B) All odd length palindromes. FOLLOW (B) = {a, b}
(C) Strings that begin and end with the same symbol. 21. The appropriate entries for E1, E2, and E3 are  [2012]
(D) All even length palindromes. (A) E1: S → a A b B, A → S
17. Which data structure in a compiler is used for managing E2: S → b A a B, B → S
information about variables and their attributes? [2010] E3: B → S
6.24 | Unit 6  •  Compiler Design

(B) E1: S → a A b B, S → e 26. Consider the grammar defined by the following produc-
E2: S → b A a B, S → e tion rules, with two operators * and +
E3: S → ∈ S→T*P
(C) E1: S → a A b B, S → e T → U|T * U
E2: S → b A a B, S → e P → Q + P|Q
E3: B → S Q → Id
(D) E1: A → S, S → e
U → Id
E2: B → S, S → e
Which one of the following is TRUE? [2014]
E3: B → S
(A) + is left associative, while * is right associative
22. What is the maximum number of reduce moves that
(B) + is right associative, while * is left associative
can be taken by a bottom-up parser for a grammar with
no epsilon-and unit-production (i.e., of type A → ∈ (C) Both + and * are right associative.
and A → a) to parse a string with n tokens? [2013] (D) Both + and * are left associative
(A) n/2 (B) n–1
27. Which one of the following problems is undecidable?
(C) 2n – 1 (D) 2n  [2014]
23. Which of the following is/are undecidable? (A) Deciding if a given context -free grammar is am-
(i)  G is a CFG. Is L (G) = φ? biguous.
(ii)  G is a CFG, Is L (G) = Σ*? (B) Deciding if a given string is generated by a given
(iii)  M is a Turing machine. Is L (M) regular? context-free grammar.
(C) Deciding if the language generated by a given
(iv)  A is a DFA and N is an NFA. Is L (A) = L (N)?
context-free grammar is empty.
[2013]
(D) Deciding if the language generated by a given
(A) (iii) only context free grammar is finite.
(B) (iii) and (iv) only
28. Which one of the following is TRUE at any valid state
(C) (i), (ii) and (iii) only
in shift-reduce parsing? [2015]
(D) (ii) and (iii) only
(A) Viable prefixes appear only at the bottom of the
24. Consider the following two sets of LR (1) items of an stack and not inside.
LR (1) grammar. [2013] (B) Viable prefixes appear only at the top of the
X → c.X, c/d X → c.X, $ stack and not inside.
(C) The stack contains only a set of viable prefixes.
X → .cX, c/d X → .cX, $
(D) The stack never contains viable prefixes.
X → .d, c/d X → .d, $
29. Among simple LR (SLR), canonical LR, and look-
Which of the following statements related to merging
ahead LR (LALR), which of the following pairs iden-
of the two sets in the corresponding LALR parser is/
tify the method that is very easy to implement and the
are FALSE?
method that is the most powerful, in that order?
(i) Cannot be merged since look - ahead are different.
 [2015]
(ii)  Can be merged but will result in S-R conflict.
(A) SLR, LALR
(iii)  Can be merged but will result in R-R conflict. (B) Canonical LR, LALR
(iv) Cannot be merged since goto on c will lead to (C) SLR, canonical LR
two different sets. (D) LALR, canonical LR
(A) (i) only (B) (ii) only
30. Consider the following grammar G
(C) (i) and (iv) only (D) (i), (ii), (iii) and (iv)
S → F | H
25. A canonical set of items is given below F → p | c
S → L. > R
H → d | c
Q → R.
Where S, F and H are non-terminal symbols, p, d
On input symbol < the sset has [2014]
and c are terminal symbols. Which of the following
(A) A shift–reduce conflict and a reduce–reduce conflict.
statement(s) is/are correct? [2015]
(B) A shift–reduce conflict but not a reduce–reduce
conflict. S1. LL(1) can parse all strings that are generated
(C) A reduce–reduce conflict but not a shift reduce using grammar G
conflict. S2. LR(1) can parse all strings that are generated using
(D) Neither a shift–reduce nor a reduce–reduce conflict. grammar G
Chapter 1  •  Lexical Analysis and Parsing  |  6.25

(A) Only S1 (B) Only S2 (A) E − >E − T | T (B) E − > TE


(C) Both S1 and S2 (D) Neither S1 nor S2 T − > T + F | F E′ − > −TE | ∈
31. Match the following: [2016] F − > (E) | id T−>T+F|F
(P) Lexical analysis (i) Leftmost derivation F − > (E) | id
(Q) Top down parsing (ii) Type checking (C) E − > TX (D) E − > TX | (TX)
(R) Semantic analysis (iii) Regular expressions X − > −TX | ∈ X − > −TX | +TX | ∈
(S) Runtime environments (iv) Activation records T − > FY T − > id
(A) P ↔ i, Q ↔ ii, R ↔ iv, S ↔ iii Y − > + FY | ∈
(B) P ↔ iii, Q ↔ i, R ↔ ii, S ↔ iv F − > (E) | id
(C) P ↔ ii, Q ↔ iii, R ↔ i, S ↔ iv 36. Which one of the following statements is FALSE?
(D) P ↔ iv, Q ↔ i, R ↔ ii, S ↔ iii  [2018]
32. A student wrote two context - free grammars G1 and (A) Context-free grammar can be used to specify
G2 for generating a single C-like array declaration. both lexical and syntax rules.
The dimension of the array is at least one. (B) Type checking is done before parsing.
(C) High-level language programs can be translated
For example, int a [10] [3];
to different Intermediate Representations.
The grammars use D as the start symbol, and use six (D) Arguments to a function can be passed using the
terminal symbols int; id [ ] num. [2016] program stack.
Grammar G1 Grammar G2
37. A lexical analyzer uses the following patterns to rec-
D → int L; D → intL; ognize three tokens T1, T2, and T3 over the alphabet {a,
L → id [E L → id E b, c}.
E → num ] E → E [num] T1: a?(b|c)*a
E → num ] [E E → [num] T2: b?(a|c)*b
Which of the grammars correctly generate the decla- T3: c?(b|a)*c
ration mentioned above? Note that ‘x?’ means 0 or 1 occurrence of the symbol
(A) Both G1 and G2 x. Note also that the analyzer outputs the token that
(B) Only G1 matches the longest possible prefix.
(C) Only G2 If the string bbaacabc is processed by the analyzer,
(D) Neither G1 nor G2 which one of the following is the sequence of tokens
33. Consider the following grammar: it outputs? [2018]
(A) T1T2T3 (B) T1T1T3
P → xQRS (C) T2T1T3 (D) T3T3
Q → yz | z 38. Consider the following parse tree for the expression
R→w |ε a#b$c$d#e#f, involving two binary operators $ and #.
S→y #
a #
What is FOLLOW (Q)? [2017]
(A) {R} (B) {w}
$ #
(C) {w, y} (D) {w, $}
34. Which of the following statements about parser is/are $ d e f
CORRECT? [2017] b c
I. Canonical LR is more powerful than SLR.
II. SLR is more powerful than LALR. Which one of the following is correct for the given
III. SLR is more powerful than Canonical LR. parse tree? [2018]
(A) I only (B) II only (A) $ has higher precedence and is left associative; #
(C) III only (D) I and III only is right associative
(B) # has higher precedence and is left associative; $
35. Consider the following expression grammar G :
is right associative
E−>E−T|T
(C) $ has higher precedence and is left associative; #
T−>T+F|F
is left associative
F − > (E) | id
(D) # has higher precedence and is right associative;
Which of the following grammars is not left recur- $ is left associative
sive, but is equivalent to G? [2017]
6.26 | Unit 6  •  Compiler Design

Answer Keys
Exercises
Practice Problems 1
1. D 2. A 3. B 4. A 5. D 6. C 7. C 8. D 9. C 10. A
11. C 12. C 13. A 14. B 15. D

Practice Problems 2
1. D 2. C 3. B 4. A 5. B 6. B 7. A 8. C 9. C 10. A
11. D 12. A 13. D 14. A 15. D 16. A 17. C 18. A 19. A

Previous Years’ Questions


1. B 2. D 3. B 4. A 5. D 6. A 7. B 8. A 9. C 10. A
11. C 12. B 13. D 14. C 15. B 16. B 17. B 18. C 19. A 20. A
21. C 22. B 23. D 24. D 25. D 26. B 27. A 28. C 29. C 30. D
31. B 32. A 33. C 34. A 35. C 36. B 37. D 38. A

You might also like