Hapter: Ugc-Net/C S /S 10

TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
CHAPTER 3
Revision of last Chapter readings
1High level languages has to be processed by either compiler and interpreter
1Difference between compiler and interpreter
1Compiler Steps
a Analysis ( Carried out by front end )
i
Lexical analysis
1 A generic lexical analyzer - lex
Syntax analysis
1 Context Free Grammar
1 How to build parse trees
1 A generic parser - yacc
Semantic analysis
a Synthesis ( Carried out by back end )

i
Intermediate code generation
Code optimization
Object code generation
Continuing syntax analysis :

We already talked about the derivations and its graphical representation i.e. parse tree.
Two type of derivations are there
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
o Leftmost derivation : at each step, leftmost non-terminal is replaced;

e.g. E => E * E => id * E => id * id
o Rightmost derivation : at each step, rightmost non-terminal is replaced;
e.g. E => E * E => E * id => id * id
Every parse tree has unique leftmost (or rightmost) derivation.
Note that a sentence can have many parse trees but a parse tree will have unique
derivation.
Evaluation of parse tree will always happen from bottom to up and left to right.
Ambiguity :
A grammar is ambiguous if a sentence has more than one parse tree, i.e., more than one
leftmost (or rightmost) derivation of a sentence is possible.
Example : Given the grammar ( set of productions)
E -> E + E
E -> E * E
E -> id
Page no. 34
How to resolve ambiguity :

Write the unambiguous grammar. This can be achieved by defining precedence rules with
extra non-terminals.
Example : Another ambiguous grammar
E -> E + E |
E-E
E*E |
E / E|
-E
id
Problem : How to convert the above ambiguous grammar into non-ambiguous.
Page no. 35
Solution : Apply precedence rules with extra non-terminals.

Usual precedence order from highest to lowest is : - (unary minus), *|/, +|Golden rule : Build grammar from lowest to highest precedence
Goal -> Expr
Expr -> Expr + Term | Expr - Term | Term
Term -> Term * Factor | Term / Factor | Factor
Factor -> -Primary | Primary
Primary -> id
Now the leftmost derivation for - id + id * id are
Goal => Expr
Expr => Expr + Term
=> Term + Term
=> Factor + Term
=> - Primary + Term
=> - id + Term
=> - id + Term*Factor
=> -id + Factor*Factor
=> -id + Primary*Factor
Page no. 36
=> -id + id * Factor

=>
-id + id * Primary
=> - id + id * id
There are three new non-terminals ( Term, Factor, Primary ). You can not have 2 parse
tree for the above sentence using above grammar.
Parser : A program that, given a sentence, reconstructs a derivation for that sentence ---- if
done successfully, it recognizes the sentence. All parsers read their input left-to-right, but
construct parse tree differently.
There are two type of parsers.

a Top-down parsers --- construct the tree from root to leaves
a Bottom-up parsers --- construct the tree from leaves to root
Top-down parser : ( LL parser - left-to-left parser )

It attempts to derive a string matching a source string through a sequence of derivations
starting with the start symbol of grammar.
In other terms it constructs parse tree by starting at the start symbol and guessing at
each derivation step.
It uses next input symbol from the sentence to guide guessing.
For a valid input string 'a', a top down parse thus determines a derivation sequence
Page no. 37
S => => => a

In top down parsing all the derivation has to be leftmost at each stage while matching the
input string and that's why top down parsing is also termed as left-to-left (LL parsing). The
first left is because all the parser reads the input sequence from left to right and second left is
for leftmost derivation.
There are three main concept in top down parsing
1 Start symbol - Selection of start symbol ( root of the parse tree ) is very important.
1 Guessing of right derivation which can lead to match the input sentence. This is called
'prediction'.
1
If the guess is wrong then one need to revert the guess and try it again. This is called
'backtracking'.
High level flow of top-down parsing :

Step 1 : Identify start symbol and start with this
Page no. 38
Step 2 : Guess a production(which can lead to match the input) and apply it -- Prediction
Step 3 : Match the input string
Step 4 : If match then go to step 2 till the complete sentence is matched Else it is a wrong
guess and revert back the derivation and go to step 2 -- Backtrack
If the prediction matches the input string then no backtracking else backtracking.
Some disadvantages of top-down parsing.
Two problems arise due to possibility of backtracking
a Semantic analysis can not be performed while making a prediction. The action must be
delayed until the prediction is known to be part of successful part. i.e. you dont know
whether this prediction is correct or not.
a A source string is known to be erroneous only after all predictions have failed. This
makes it very inefficient.
Based on prediction and backtracking top-down parsers can be categorized into two
categories
1 Recursive-Descent Parsing ( RD) - A top-down parser with backtrack
Backtracking is needed (If a choice of a production rule does not work, we

backtrack to try other alternatives.)
It is a general parsing technique, but not widely used. Not efficient. Can be used
for quick and dirty parsing.
At each derivation it uses RHS of a derivation from left to right.
Grammar with right recursion is suitable for this and do not enter into infinite
loop while making predictions.
Why the name is recursive-descent ?
Page no. 39
Parser is recursive in nature ( recursive derivations )

Descent because it goes from top->down
Example :
S aBc
B bc | b ( Here it uses bc for B first and then b )
Grammar suitable for recursive-descent top-down parsing

Grammars containing left recursion( NT appears at left side of RHS of a production) are
not suitable for top-down parsing.
Example : for the string == id + id*id
E => E + T | T
T => T * V | V
V => <id>
The first production would be
E => E + T
Page no. 40
Now E has to be replaced as in top-down parsing leftmost derivtion takes place.

If we consider the recursive-descent parsing then E will be again replaced by E + T which
will create an infinite loop for prediction making.
Grammars containing right recursion are suitable for top-down parsing and they never
enter into infinite loop. However this method is time consuming and error-prone for large
grammars.
Example : The above grammar can be written as right recursion as follows.
E => T + E | T
T => V * T | V
V => <id>
The first production would be
E => T + E
T has to be replaced ( as top-down parsing has leftmost derivation ). Here is the complete
sequence
E => T + E
=> V + E
=> <id> + E
=> <id> + T
=> <id> + V * T
Page no. 41
=> <id> + <id> * T

=> <id> + <id> * V
=> <id> + <id> * <id>
1 Predictive Parsing (PP) - ( Also called recursive predictive parsing )
Predictive Parsing is a special form of Recursive Descent parsing without

backtracking.
Efficient
Needs a special form of CFG known as LL(k) grammar. Possible for only LL(k)
grammar.
LL(k) grammar - are the context-free grammars for which there exists some positive
integer k that allows a recursive descent parser to decide which production to use by
examining only the next k tokens of input. Here are some of the properties.
Subset of CFGs
Permits deterministic left-to-right recognition with a look ahead of k symbols
Builds the parse tree top-down
If a parse table can be constructed for the grammar, then it is LL(k), if it cant, it
is
not LL(k)
Each LL(k) grammar is unambiguous
An LL(k) has no left-recursion. It might have right recursion but in case of right
recursive production rule the same non terminal must have a production rule for
epsilon also. With left recursion there might be chances of infinity loop which
will never make this possible for a right prediction of K symbols.
Page no. 42
Given a left recursive grammar ( or right recursive grammar)this can be converted

to LL(k) grammar using concept of left factoring.
Left factoring : Take common parts of productions and form a new non terminal. With left
factoring each production (i.e. each non terminal)become non-recursive or right recursive.
If the production is right recursive then there is production for e (epsilon)
Examples : How to convert a left recursive grammar into LL(k) grammar
E => E + T | T
T => T * V | V
----> Left recursive ( Not suitable for any top-down parsing )
V => <id>
|
|
v
E => T + E | T
T => V * T | V ---> right recursive ( suitable for top-down recursive descent parsing )
V => <id>
|
|
v
E => TE'
Page no. 43
E' => +T E' | e
( Note that all the recursive production will have an derivation to e
(epsilon ) )
T => V T '
---> Left factored LL(k) grammar ( suitable for top-down predictive parsing
)
T' => *V T' | e
V = <id>
Other examples on how to reduce grammar
Page no. 44
The LL(k) grammars therefore exclude all ambiguous grammars, as well as all grammars
that contain left recursion.
LL(1) --> recursive descent parser can decide which production to apply by examining only
the next '1' token of input.
The predictive parser which uses the LL(1) grammar is known as LL(1) parser.
Something more about LL(1) parser
o LL(1) means that
the input is processed left-to-right
a leftmost derivation is constructed
the method uses at most one lookahead token
o An LL(1) parser is a table driven parser for left-to-left parsing ( LL parsing ).
o The '1' in LL(1) indicates that the grammar uses a look-ahead of one source
symbol. i.e. the prediction to be made is
determined by the next source symbol.
o It expects an LL(1) grammar.
There are two important concepts in LL(1) parsing
o Parsing table and algorithm to create parsing table
Page no. 45
o Algorithm for derivations

About parsing table :
o The parsing table has a row for each Non terminal(NT) in production rules
o The parsing table has a column for each Terminal(T) in production rules
o A parsing table entry PT(NT, T) indicates what prediction should be made if
NT is the leftmost non-terminal in a sentential form
And T is the next source symbol.
o A blank entry in parsing table indicates an error. Multiple entry in table
indicates conflict and this tells that the grammar is not LL(1)
There must be exactly one entry in a cell.
o There is a special column which depicts the end of symbols and it is marked as
$ or |Algorithm to create the parsing table :
Page no. 46
a E First(alpha) = => If alpha can derive a string starting from a

B E Follow(A) ==> b that can follow a string derived from A
Example :
Here is the example of LL(1) grammar for arithmetic operation and the
corresponding table
Grammar :
Input string :
Parsing Table :
Page no. 47
LL(1) parsing algorithm
{Input : A string and a and parsing table M for grammar G.}

{Output : If is in L(G), a leftmost derivation of , otherwise, an error indication.}
Initially, the parser is in a configuration in which it has $S on the stack with S, the start
symbol of G on top, and $ in the imput buffer.
Set ip to point ot the first symbol of $.
Repeat
Let X be the top stack symbol and a the symbol pointed to by ip.
if X = a
Pop of X fromt he stack and advance ip.
else
error()
end if
else
{X is a nonterminal}
Yk
if M X , a X Y1Y2
Pop X fromt he stack
Page no. 48
, Y1
Push Yk , Yk 1 ,
onto the stack, with Y1 on top
Yk
Output the production X Y1Y2
else
error()
end if
end if
unitl X = $
{stack is empty}
Parsing steps :
Bottom-up parser : (Construct parse tree bottom-up --- from leaves to the root ) As the
name suggests, bottom-up parsing works in the opposite direction from topdown. A topdown parser begins with the start symbol at the top of the parse tree and works downward,
Page no. 49
driving productions in forward order until it gets to the terminal leaves. A bottom-up parse
starts with the string of terminals itself and builds from the leaves upward, working
backwards to the start symbol by applying the productions in reverse. Along the way, a
bottom-up parser searches for substrings of the working string that match the right side of
some production. When it finds such a substring, it reduces it, i.e., substitutes the left side
nonterminal for the matching right side. The goal is to reduce all the way up to the start
symbol and report a successful parse.
In general, bottom-up parsing algorithms are more powerful than top-down methods, but not
surprisingly, the constructions required are also more complex. It is difficult to write a
bottom-up parser by hand for anything but trivial grammars, but fortunately, there are
excellent parser generator tools like yacc that build a parser from an input specification
Some features of bottom up parsing
o Bottom-up parsing always constructs right-most derivation
o It attempts to build trees upward toward the start symbol.
o More complex than top-down but efficient
Types of bottom up parser ( 2 types - shift reduce and precedence)
Shift reduce parser
Shift-reduce parsing is the most commonly used and the most powerful of the bottom-up
techniques. It takes as input a stream of tokens and develops the list of productions used to
build the parse tree, but the productions are discovered in reverse order of a topdown parser.
Like a table-driven predictive parser, a bottom-up parser makes use of a stack to keep track
of the position in the parse and a parsing table to determine what to do next.
To illustrate stack-based shift-reduce parsing, consider this simplified expression grammar:
Page no. 50
S > E
E > T | E + T
T > id | (E)
The shift-reduce strategy divides the string that we are trying parse into two parts: an
undigested part and a semi-digested part.
The undigested part contains the tokens that are still to come in the input, and the semidigested part is put on a stack. If parsing the string v, it starts out completely undigested, so
the input is initialized to v, and the stack is initialized to empty. A shift-reduce parser
proceeds by taking one of three actions at each step:
o Reduce:
If we can find a rule A > w, and if the contents of the stack are qw for
some q (q may be empty), then we can reduce the stack to qA. We are applying the
production for the nonterminal A backwards. There is also one special case: reducing the
entire contents of the stack to the start symbol with no remaining input means we have
recognized the input as a valid sentence (e.g., the stack contains just w, the input is
empty, and we apply S > w). This is the last step in a successful parse. The w being
reduced is referred to as a handle.
o Shift: If it is impossible to perform a reduction and there are tokens remaining in the
undigested input, then we transfer a token from the input onto the stack. This is called a
shift. For example, using the grammar above, suppose the stack contained ( and the input
contained id+id). It is impossible to perform a reduction on ( since it does not match
Page no. 51
the entire right side of any of our productions. So, we shift the first character of the input
onto the stack, giving us (id on the stack and +id) remaining in the input.
o Error: If neither of the two above cases apply, we have an error. If the sequence on the
stack does not match the right-hand side of any production, we cannot reduce. And if
shifting the next input token would create a sequence on the stack that cannot eventually
be reduced to the start symbol, a shift action would be futile. Thus, we have hit a dead
end where the next token conclusively determines the input cannot form a valid sentence.
This would happen in the above grammar on the input id+). The first id would be shifted,
then reduced to T and again to E, next + is shifted. At this point, the stack contains E+ and
the next input token is ). The sequence on the stack cannot be reduced, and shifting the )
would create a sequence that is not viable, so we have an error.
The general idea is to read tokens from the input and push them onto the stack attempting to
build sequences that we recognize as the right side of a production. When we find a match,
we replace that sequence with the nonterminal from the left side and continue working our
way up the parse tree. This process builds the parse tree from the leaves upward, the inverse
of the top-down parser. If all goes well, we will end up moving everything from the input to
the stack and eventually construct a sequence on the stack that we recognize as a right-hand
side for the start symbol.
Example :
Grammar :
Page no. 52
Input :
Another example :
Page no. 53
Another Example: E -> E + E | E * E | ( E ) | a | b | c
Conflicts in the shift-reduce parsing : ambiguous grammars lead to parsing conflicts;

conflicts can be fixed by rewriting the grammar, or making a decision during parsing
shift / reduce (SR) conflicts : choose between reduce and shift actions
S -> if E then S | if E then S else S| ......
Page no. 54
reduce/reduce (RR) conflicts : choose between two reductions
LR Parsing : table driven shift reduce parser

LR parsers ("L" for left to right scan of input, "R" for rightmost derivation) are efficient,
table-driven shift-reduce parsers.
The class of grammars that can be parsed using LR methods is a proper superset of the class
of grammars that can be parsed with predictive LL parsers. In fact, virtually all programming
language constructs for which CFGs can be written can be parsed with LR techniques. As an
added advantage, there is no need for lots of grammar rearrangement to make it acceptable
for LR parsing the way that LL parsing requires. The primary disadvantage is the amount of
work it takes to build the tables by hand, which makes it infeasible to hand-code an LR
parser for most grammars. Fortunately, there are LR parser generators that create the parser
from an unambiguous CFG specification. The parser tool does all the tedious and complex
work to build the necessary tables and can report any ambiguities or language constructs that
interfere with the ability to parse it using LR techniques. Rather than reading and shifting
tokens onto a stack, an LR parser pushes "states" onto the stack; these states describe what is
on the stack so far.
Page no. 55
An LR parser uses two tables:

1. The action table : Action[s,a] tells the parser what to do when the state on top of the
stack is s and terminal a is the next input token. The possible actions are to shift a state onto
the stack, to reduce the handle on top of the stack, to accept the input, or to report an error.
2. The goto table : Goto[s,X] indicates the new state to place on top of the stack after a
reduction of the nonterminal X while state s is on top of the stack.
LR Parser Types
There are three types of LR parsers: LR(k), simple LR(k), and lookahead LR(k)
(abbreviated to LR(k), SLR(k), LALR(k))). The k identifies the number of tokens of
lookahead. We will usually only concern ourselves with 0 or 1 tokens of lookahead, but the
techniques do generalize to k > 1.
Here are some widely used LR parsers based on value of k.
o LR(0) - No lookahead symbol
o SLR(1) - Simple with one lookahead symbol
o LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to
implement. YACC deals with this kind of grammar.
o LR(1) - Most general grammar, but most complex to implement.
LR(0) is the simplest of all the LR parsing methods. It is also the weakest and although of
theoretical importance, it is not used much in practice because of its limitations. LR(0)
parses without using any lookahead at all. Adding just one token of lookahead to get LR(1)
vastly increases the parsing power. Very few grammars can be parsed with LR(0), but most
Page no. 56
unambiguous CFGs can be parsed with LR(1). The drawback of adding the lookahead is that
the algorithm becomes somewhat more complex and the parsing table gets much, much
bigger. The full LR(1) parsing table for a typical programming language has many thousands
of states compared to the few hundred needed for LR(0). A compromise in the middle is
found in the two variants SLR(1) and LALR(1) which also use one token of lookahead but
employ techniques to keep the table as small as LR(0). SLR(k) is an improvement over
LR(0) but much weaker than full LR(k) in terms of the number of grammars for which it is
applicable. LALR(k) parses a larger set of languages than SLR(k) but not quite as many as
LR(k). LALR(1) is the method used by the yacc parser generator.
o Precedence parser
Simple precedence parser
Operator-precedence parser
Extended precedence parser
Questions from the previous papers

1. Explain the function of the software tool YACC.
2. With the help of diagram explain the process of parsing. What is the output generated
after parsing process ?
3. What is semantic analysis ? Explain the semantic analysis of an arithmetic expression
with an example.
4. Explain the principal of lex and yacc ? How they communicate with each other ? OR
5. Explain the utility of Lex and Yacc in the construction of a compiler ?
Page no. 57
6. Explain different phases of compiler ?

7. What is a cross compiler ?
8. What is a bootstrap compiler ?
Bootstrapping is a term used in computer science to describe the techniques involved in
writing a compiler (or assembler) in the target programming language which it is intended to
compile itself improvements to the compiler's back-end improve not only general purpose
programs but also the compiler itself it is a comprehensive consistency check as it should be
able to reproduce its own object code.
Earlier versions are written for a subset of language and then checks itself and then
incrementally completes this.
1 Explain the lexical analysis ? Which tool can be used to generate the lexical analyzer ?
Explain a bit about tool.
1 Explain various tasks performed during lexical analysis. Also explain the relevance of
Regular Expression in lexical analysis.
1 What is context free grammar. Write down CFG for for loop of 'C' language
1 What is symbol table ?
An essential function of compiler is to record the identifiers and the relevant information
about its attribute type, its scope and in case of procedure or the function, names, arguments,
return types. A symbol table is a table containing a record for each identifier with fields for
the attribute of the identifier . This table is used by all the steps of compiler to access the data
as well as report errors.
1 Generate parse tree for following sentences based on standard arithmetic CFG
Page no. 58
a -b *c
a + b * c -d / ( e * f)
a + b *c -d + e -f /(g + h )
a + b * c / d + e -f
A /b + c * d + e -f
9*7+5-2
Use following grammar if not given :
E => E + T | E-T|T
T => T * V | T/V| V
V => <id> | (E)
a-b*c
First of all find a start prediction. ( Always start from lower to higher precedence in the input
string).
E => E - T
=> T-T
=> V-T
=> <id> - T
=> <id> - T*V
Page no. 59
=> <id> - V*V

=> <id> - <id>*V
=> <id> - <id> * <id>
a+b *c -d / (e*f) ( two at the lowest precedence + and -, choose the one which is at rightmost
side i.e. -)
E => E - T
E => E + T - T
=> T + T - T
=> V + T - T
=> <id> + T - T
=> <id> + T * V - T
=> <id> + V * V - T
=> <id> + <id> * V - T
=> <id> + <id> * <id> - T
=> <id> + <id> * <id> - T/V
=> <id> + <id> * <id> - V/V
=> <id> + <id> * <id> - <id>/V
=> <id> + <id> * <id> - <id>/(E)
Page no. 60
=> <id> + <id> * <id> - <id>/(T)

=> <id> + <id> * <id> - <id>/(T*V)
=> <id> + <id> * <id> - <id>/(V*V)
=> <id> + <id> * <id> - <id>/(<id>*V)
=> <id> + <id> * <id> - <id>/(<id>*<id>)
Page no. 61

Hapter: Ugc-Net/C S /S 10

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hapter: Ugc-Net/C S /S 10

Uploaded by

Copyright:

Available Formats

TRAJECTORY EDUCATION

a Synthesis ( Carried out by back end )

Intermediate code generation

Object code generation

Continuing syntax analysis :

o Leftmost derivation : at each step, leftmost non-terminal is replaced;

How to resolve ambiguity :

Solution : Apply precedence rules with extra non-terminals.

=> -id + id * Factor

There are two type of parsers.

Top-down parser : ( LL parser - left-to-left parser )

S => => => a

High level flow of top-down parsing :

Backtracking is needed (If a choice of a production rule does not work, we

At each derivation it uses RHS of a derivation from left to right.

Why the name is recursive-descent ?

Parser is recursive in nature ( recursive derivations )

Grammar suitable for recursive-descent top-down parsing

Now E has to be replaced as in top-down parsing leftmost derivtion takes place.

=> <id> + <id> * T

1 Predictive Parsing (PP) - ( Also called recursive predictive parsing )

Predictive Parsing is a special form of Recursive Descent parsing without

Given a left recursive grammar ( or right recursive grammar)this can be converted

----> Left recursive ( Not suitable for any top-down parsing )

( Note that all the recursive production will have an derivation to e

Other examples on how to reduce grammar

o Algorithm for derivations

a E First(alpha) = => If alpha can derive a string starting from a

LL(1) parsing algorithm

{Input : A string and a and parsing table M for grammar G.}

Pop X fromt he stack

Another Example: E -> E + E | E * E | ( E ) | a | b | c

Conflicts in the shift-reduce parsing : ambiguous grammars lead to parsing conflicts;

reduce/reduce (RR) conflicts : choose between two reductions

LR Parsing : table driven shift reduce parser

An LR parser uses two tables:

Questions from the previous papers

6. Explain different phases of compiler ?

=> <id> - V*V

=> <id> + <id> * <id> - <id>/(T)

You might also like