You are on page 1of 14

Lecture-05

Main Topics:
1. FIRST and FOLLOW
2. Predictive Parser / LL(1) Parser

FIRST and FOLLOW (Very Important)

FIRST Set in Syntax Analysis


FIRST(X) for a grammar symbol X is the set of terminals that begin the strings derivable from
X.
Rules to compute FIRST set:
1. If x is a terminal, then FIRST(x) = { ‘x’ }
2. If x-> Є, is a production rule, then add Є to FIRST(x).
3. If X->Y1 Y2 Y3….Yn is a production,
a. FIRST(X) = FIRST(Y1)
b. If FIRST(Y1) contains Є then FIRST(X) = { FIRST(Y1) – Є } U { FIRST(Y2) }
c. If FIRST (Yi) contains Є for all i = 1 to n, then add Є to FIRST(X).

Example 1:
Production Rules of Grammar
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id

FIRST sets
FIRST(E) = FIRST(T) = FIRST(F)={ ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }

Example 2:
Production Rules of Grammar
S -> ACB | Cbb | Ba
A -> da | BC
B -> g | Є
C -> h | Є

FIRST sets
FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C)
= { d, g, h, Є, b, a}
FIRST(A) = { d } U FIRST(B) = { d, g , h, Є }
FIRST(B) = { g , Є }
FIRST(C) = { h , Є }

Notes:
1. The grammar used above is Context-Free Grammar (CFG). Syntax of most of the
programming language can be specified using CFG.
2. CFG is of the form A -> B , where A is a single Non-Terminal, and B can be a set of
grammar symbols ( i.e. Terminals as well as Non-Terminals)

FOLLOW Set in Syntax Analysis


Follow(X) to be the set of terminals that can appear immediately to the right of Non-Terminal
X in some sentential form.

Example:
S ->Aa | Ac
A ->b
S S
/ \ / \
A a A C
| |
b b

Here, FOLLOW (A) = {a, c}

Rules to compute FOLLOW set:

1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal

2) If A -> pBq is a production, where p, B and q are any grammar


symbols, then everything in FIRST(q) except Є is in FOLLOW(B).

3) If A->pB is a production, then everything in FOLLOW(A) is in


FOLLOW(B).

4) If A->pBq is a production and FIRST(q) contains Є,


then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)

Example 1:

Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id

FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }

FOLLOW Set
FOLLOW(E) = { $ , ) } // Note ')' is there because of 5th rule
FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st production rule FOLLOW(T)
= { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) = { + , $ , ) }
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U FOLLOW(T) = { *, +, $, ) }

Example 2:

Production Rules:
S -> aBDh
B -> cC
C -> bC | Є
D -> EF
E -> g | Є
F -> f | Є

FIRST set
FIRST(S) = { a }
FIRST(B) = { c }
FIRST(C) = { b , Є }
FIRST(D) = FIRST(E) U FIRST(F) = { g, f, Є }
FIRST(E) = { g , Є }
FIRST(F) = { f , Є }

FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(B) = { FIRST(D) – Є } U FIRST(h) = { g , f , h }
FOLLOW(C) = FOLLOW(B) = { g , f , h }
FOLLOW(D) = FIRST(h) = { h }
FOLLOW(E) = { FIRST(F) – Є } U FOLLOW(D) = { f , h }
FOLLOW(F) = FOLLOW(D) = { h }

Example 3:

Production Rules:
S -> ACB|Cbb|Ba
A -> da|BC
B-> g|Є
C-> h| Є

FIRST set
FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C) = { d, g, h, Є, b,
a} FIRST(A) = { d } U {FIRST(B)-Є} U FIRST(C) = { d, g, h, Є }
FIRST(B) = { g, Є }
FIRST(C) = { h, Є }

FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(A) = { h, g, $ }
FOLLOW(B) = { a, $, h, g }
FOLLOW(C) = { b, g, $, h }

Note :
1. Є as a FOLLOW doesn’t mean anything (Є is an empty string).
2. $ is called end-marker, which represents the end of the input string, hence used while parsing
to indicate that the input string has been completely processed.
3. The grammar used above is Context-Free Grammar (CFG). The syntax of a programming
language can be specified using CFG.
4. CFG is of the form A -> B , where A is a single Non-Terminal, and B can be a set of
grammar symbols ( i.e. Terminals as well as Non-Terminals)
***Why we need FIRST and FOLLOW?

TOP-Down Parsing
We have learnt in the last chapter that the top-down parsing technique parses the input, and starts
constructing a parse tree from the root node gradually moving down to the leaf nodes. The types of
top-down parsing are depicted below:
Recursive Descent Parsing
Recursive descent is a top-down parsing technique that constructs the parse tree from the top
and the input is read from left to right. It uses procedures for every terminal and non terminal
entity. This parsing technique recursively parses the input to make a parse tree, which may or
may not require back-tracking. But the grammar associated with it (if not left factored) cannot
avoid back-tracking. A form of recursive-descent parsing that does not require any
back-tracking is known as predictive parsing.
This parsing technique is regarded recursive as it uses context-free grammar which is recursive
in nature.
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string against
the production rules to replace them (if matched). To understand this, take the following
example of CFG:
S → rXd | rZd

X → oa | ea
Z → ai

For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of
the input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the top-down parser
advances to the next input letter (i.e. ‘e’). The parser tries to expand non-terminal ‘X’ and
checks its production from the left (X → oa). It does not match with the next input symbol. So
the top-down parser backtracks to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.

Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer from
backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the
next input symbols. To make the parser back-tracking free, the predictive parser puts some
constraints on the grammar and accepts only a class of grammar known as LL(k) grammar.

Predictive parsing uses a stack and a parsing table to parse the input and generate a parse
tree. Both the stack and the input contains an end symbol $ to denote that the stack is empty
and the input is consumed. The parser refers to the parsing table to take any decision on the
input and stack element combination.

In recursive descent parsing, the parser may have more than one production to choose from
for a single instance of input, whereas in predictive parser, each step has at most one
production to choose. There might be instances where there is no production matching the
input string, making the parsing procedure to fail.
LL Parser
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with
some restrictions to get the simplified version, in order to achieve easy implementation. LL
grammar can be implemented by means of both algorithms namely, recursive-descent or
table-driven.
LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the
second L in LL(k) stands for left-most derivation and k itself represents the number of look
aheads. Generally k = 1, so LL(k) may also be written as LL(1).
LL Parsing Algorithm
We may stick to deterministic LL(1) for parser explanation, as the size of table grows
exponentially with the value of k. Secondly, if a given grammar is not LL(1), then usually, it is
not LL(k), for any given k.
Given below is an algorithm for LL(1) Parsing:
Input:
string ω
parsing table M for grammar G

Output:
If ω is in L(G) then left-most derivation of ω,
error otherwise.

Initial State : $S on stack (with S being start symbol)


ω$ in the input buffer

SET ip to point the first symbol of ω$.

repeat
let X be the top stack symbol and a the symbol pointed by ip.

if X∈ Vt or $
if X = a
POP X and advance ip.
else
error()
endif

else /* X is non-terminal */
if M[X,a] = X → Y1, Y2,... Yk
POP X
PUSH Yk, Yk-1,... Y1 /* Y1 on top */
Output the production X → Y1, Y2,... Yk
else
error()
endif
endif
until X = $ /* empty stack */

When a grammar G is LL(1) ?


A grammar G is LL(1) if A → α | β are two distinct productions of G:
∙ for no terminal, both α and β derive strings beginning with a.

∙ at most one of α and β can derive empty string.

∙ if β → t, then α does not derive any string beginning with a terminal in FOLLOW(A).

Difference between Recursive Predictive Descent Parser and Non-Recursive


Predictive Descent Parser:
Recursive Predictive Descent Non-Recursive Predictive Descent
Parser/ Recursive-Descent Parser Parser/ Predictive Parser

It is a technique which may or may It is a technique that does not require


not require backtracking process. any kind of back tracking.

It uses procedures for every non It finds out productions to use by


terminal entity to parse strings. replacing input string.

It is a type of top-down parsing built from It is a type of top-down approach, which is


a set of mutually recursive procedures also a type of recursive parsing that does
where each procedure implements one of not uses technique of backtracking.
non terminal s of grammar.
It contains several small small functions The predictive parser uses a look ahead
one for each non- terminals in grammar. pointer which points to next input
symbols to make it parser back tracking
free, predictive parser puts some
constraints on grammar.

It accepts all kinds of grammars. It accepts only a class of grammar known


as LL(k) grammar.

Construction of LL(1) Parsing Table(Very Important)


Construction of LL(1) Parsing Table:
To construct the Parsing table, we have two functions:

1: First(): If there is a variable, and from that variable if we try to drive all the strings then the
beginning Terminal Symbol is called the first.
2: Follow(): What is the Terminal Symbol which follow a variable in the process of derivation.

Now, after computing the First and Follow set for each Non-Terminal symbol we have to
construct the Parsing table. In the table Rows will contain the Non-Terminals and the
column will contain the Terminal Symbols.

All the Null Productions of the Grammars will go under the Follow elements and the
remaining productions will lie under the elements of First set.
Now, let’s understand with an example.
Example-1:
Consider the Grammar:

E --> TE'
E' --> +TE' | e
T --> FT'
T' --> *FT' | e
F --> id | (E)
**e denotes epsilon

Find their first and follow sets:


First Follow

E –> TE’ { id, ( } { $, ) }

E’ –> +TE’/e { +, e } { $, ) }

T –> FT’ { id, ( } { +, $, ) }

T’ –> *FT’/e { *, e } { +, $, ) }

F –> id/(E) { id, ( } { *, +, $, ) }

Now, the LL(1) Parsing Table is:


id + * ( ) $

E E –> TE’ E –> TE’

E’ E’ –> +TE’ E’ –> e E’ –> e

T T –> FT’ T –> FT’

id + * ( ) $
T’ T’ –> e T’ –> *FT’ T’ –> e T’ –> e

F F –> id F –> (E)

As you can see that all the null productions are put under the follow set of that symbol and
all the remaining productions are lie under the first of that symbol.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one cell
may contain more than one production.
Let’s see with an example.

Example-2:
Consider the Grammar

S --> A | a
A --> a
Find their first and follow sets:
First Follow

S –> A/a {a} {$}

A –>a {a} {$}

Parsing Table:
a $

S S –> A, S –> a

A A –> a

Here, we can see that there are two productions into the same cell. Hence, this grammar is
not feasible for LL(1) Parser.
Important Notes
1. If a grammar contain left factoring then it can not be LL(1)
Eg : S -> aS | a ---- both productions go in a
2. If a grammar contain left recursion it can not be LL(1)
Eg : S -> Sa | b
S -> Sa goes to FIRST(S) = b
S -> b goes to b, thus b has 2 entries hence not LL(1)
3. If a grammar is ambiguous then it can not be LL(1)
4. Every regular grammar need not be LL(1) because regular grammar may contain left
factoring, left recursion or ambiguity.

You might also like