Professional Documents
Culture Documents
Hapter: Ugc-Net/C S /S 10
Hapter: Ugc-Net/C S /S 10
UGCNET/COMPUTER SCIENCE/SLOT 10
CHAPTER 3
Revision of last Chapter readings
1High level languages has to be processed by either compiler and interpreter
1Difference between compiler and interpreter
1Compiler Steps
a Analysis ( Carried out by front end )
i
Lexical analysis
1 A generic lexical analyzer - lex
Syntax analysis
1 Context Free Grammar
1 How to build parse trees
1 A generic parser - yacc
Semantic analysis
Code optimization
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
E*E |
E / E|
-E
id
Problem : How to convert the above ambiguous grammar into non-ambiguous.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 35
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 36
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
-id + id * Primary
=> - id + id * id
There are three new non-terminals ( Term, Factor, Primary ). You can not have 2 parse
tree for the above sentence using above grammar.
Parser : A program that, given a sentence, reconstructs a derivation for that sentence ---- if
done successfully, it recognizes the sentence. All parsers read their input left-to-right, but
construct parse tree differently.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 37
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
If the guess is wrong then one need to revert the guess and try it again. This is called
'backtracking'.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 38
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Step 2 : Guess a production(which can lead to match the input) and apply it -- Prediction
Step 3 : Match the input string
Step 4 : If match then go to step 2 till the complete sentence is matched Else it is a wrong
guess and revert back the derivation and go to step 2 -- Backtrack
If the prediction matches the input string then no backtracking else backtracking.
Some disadvantages of top-down parsing.
Two problems arise due to possibility of backtracking
a Semantic analysis can not be performed while making a prediction. The action must be
delayed until the prediction is known to be part of successful part. i.e. you dont know
whether this prediction is correct or not.
a A source string is known to be erroneous only after all predictions have failed. This
makes it very inefficient.
Based on prediction and backtracking top-down parsers can be categorized into two
categories
1 Recursive-Descent Parsing ( RD) - A top-down parser with backtrack
It is a general parsing technique, but not widely used. Not efficient. Can be used
for quick and dirty parsing.
Grammar with right recursion is suitable for this and do not enter into infinite
loop while making predictions.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 39
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 40
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 41
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Efficient
Needs a special form of CFG known as LL(k) grammar. Possible for only LL(k)
grammar.
LL(k) grammar - are the context-free grammars for which there exists some positive
integer k that allows a recursive descent parser to decide which production to use by
examining only the next k tokens of input. Here are some of the properties.
Subset of CFGs
Permits deterministic left-to-right recognition with a look ahead of k symbols
Builds the parse tree top-down
If a parse table can be constructed for the grammar, then it is LL(k), if it cant, it
is
not LL(k)
Each LL(k) grammar is unambiguous
An LL(k) has no left-recursion. It might have right recursion but in case of right
recursive production rule the same non terminal must have a production rule for
epsilon also. With left recursion there might be chances of infinity loop which
will never make this possible for a right prediction of K symbols.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 42
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
V => <id>
|
|
v
E => T + E | T
T => V * T | V ---> right recursive ( suitable for top-down recursive descent parsing )
V => <id>
|
|
v
E => TE'
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 43
TRAJECTORY EDUCATION
E' => +T E' | e
UGCNET/COMPUTER SCIENCE/SLOT 10
(epsilon ) )
T => V T '
---> Left factored LL(k) grammar ( suitable for top-down predictive parsing
)
T' => *V T' | e
V = <id>
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 44
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
The LL(k) grammars therefore exclude all ambiguous grammars, as well as all grammars
that contain left recursion.
LL(1) --> recursive descent parser can decide which production to apply by examining only
the next '1' token of input.
The predictive parser which uses the LL(1) grammar is known as LL(1) parser.
Something more about LL(1) parser
o LL(1) means that
the input is processed left-to-right
a leftmost derivation is constructed
the method uses at most one lookahead token
o An LL(1) parser is a table driven parser for left-to-left parsing ( LL parsing ).
o The '1' in LL(1) indicates that the grammar uses a look-ahead of one source
symbol. i.e. the prediction to be made is
determined by the next source symbol.
o It expects an LL(1) grammar.
There are two important concepts in LL(1) parsing
o Parsing table and algorithm to create parsing table
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 45
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 46
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Example :
Here is the example of LL(1) grammar for arithmetic operation and the
corresponding table
Grammar :
Input string :
Parsing Table :
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 47
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
{X is a nonterminal}
Yk
if M X , a X Y1Y2
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 48
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
, Y1
Push Yk , Yk 1 ,
onto the stack, with Y1 on top
Yk
Output the production X Y1Y2
else
error()
end if
end if
unitl X = $
{stack is empty}
Parsing steps :
Bottom-up parser : (Construct parse tree bottom-up --- from leaves to the root ) As the
name suggests, bottom-up parsing works in the opposite direction from topdown. A topdown parser begins with the start symbol at the top of the parse tree and works downward,
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 49
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
driving productions in forward order until it gets to the terminal leaves. A bottom-up parse
starts with the string of terminals itself and builds from the leaves upward, working
backwards to the start symbol by applying the productions in reverse. Along the way, a
bottom-up parser searches for substrings of the working string that match the right side of
some production. When it finds such a substring, it reduces it, i.e., substitutes the left side
nonterminal for the matching right side. The goal is to reduce all the way up to the start
symbol and report a successful parse.
In general, bottom-up parsing algorithms are more powerful than top-down methods, but not
surprisingly, the constructions required are also more complex. It is difficult to write a
bottom-up parser by hand for anything but trivial grammars, but fortunately, there are
excellent parser generator tools like yacc that build a parser from an input specification
Some features of bottom up parsing
o Bottom-up parsing always constructs right-most derivation
o It attempts to build trees upward toward the start symbol.
o More complex than top-down but efficient
Types of bottom up parser ( 2 types - shift reduce and precedence)
Shift reduce parser
Shift-reduce parsing is the most commonly used and the most powerful of the bottom-up
techniques. It takes as input a stream of tokens and develops the list of productions used to
build the parse tree, but the productions are discovered in reverse order of a topdown parser.
Like a table-driven predictive parser, a bottom-up parser makes use of a stack to keep track
of the position in the parse and a parsing table to determine what to do next.
To illustrate stack-based shift-reduce parsing, consider this simplified expression grammar:
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 50
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
S > E
E > T | E + T
T > id | (E)
The shift-reduce strategy divides the string that we are trying parse into two parts: an
undigested part and a semi-digested part.
The undigested part contains the tokens that are still to come in the input, and the semidigested part is put on a stack. If parsing the string v, it starts out completely undigested, so
the input is initialized to v, and the stack is initialized to empty. A shift-reduce parser
proceeds by taking one of three actions at each step:
o Reduce:
If we can find a rule A > w, and if the contents of the stack are qw for
some q (q may be empty), then we can reduce the stack to qA. We are applying the
production for the nonterminal A backwards. There is also one special case: reducing the
entire contents of the stack to the start symbol with no remaining input means we have
recognized the input as a valid sentence (e.g., the stack contains just w, the input is
empty, and we apply S > w). This is the last step in a successful parse. The w being
reduced is referred to as a handle.
o Shift: If it is impossible to perform a reduction and there are tokens remaining in the
undigested input, then we transfer a token from the input onto the stack. This is called a
shift. For example, using the grammar above, suppose the stack contained ( and the input
contained id+id). It is impossible to perform a reduction on ( since it does not match
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 51
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
the entire right side of any of our productions. So, we shift the first character of the input
onto the stack, giving us (id on the stack and +id) remaining in the input.
o Error: If neither of the two above cases apply, we have an error. If the sequence on the
stack does not match the right-hand side of any production, we cannot reduce. And if
shifting the next input token would create a sequence on the stack that cannot eventually
be reduced to the start symbol, a shift action would be futile. Thus, we have hit a dead
end where the next token conclusively determines the input cannot form a valid sentence.
This would happen in the above grammar on the input id+). The first id would be shifted,
then reduced to T and again to E, next + is shifted. At this point, the stack contains E+ and
the next input token is ). The sequence on the stack cannot be reduced, and shifting the )
would create a sequence that is not viable, so we have an error.
The general idea is to read tokens from the input and push them onto the stack attempting to
build sequences that we recognize as the right side of a production. When we find a match,
we replace that sequence with the nonterminal from the left side and continue working our
way up the parse tree. This process builds the parse tree from the leaves upward, the inverse
of the top-down parser. If all goes well, we will end up moving everything from the input to
the stack and eventually construct a sequence on the stack that we recognize as a right-hand
side for the start symbol.
Example :
Grammar :
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 52
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Input :
Another example :
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 53
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 54
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 55
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
LR Parser Types
There are three types of LR parsers: LR(k), simple LR(k), and lookahead LR(k)
(abbreviated to LR(k), SLR(k), LALR(k))). The k identifies the number of tokens of
lookahead. We will usually only concern ourselves with 0 or 1 tokens of lookahead, but the
techniques do generalize to k > 1.
Here are some widely used LR parsers based on value of k.
o LR(0) - No lookahead symbol
o SLR(1) - Simple with one lookahead symbol
o LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to
implement. YACC deals with this kind of grammar.
o LR(1) - Most general grammar, but most complex to implement.
LR(0) is the simplest of all the LR parsing methods. It is also the weakest and although of
theoretical importance, it is not used much in practice because of its limitations. LR(0)
parses without using any lookahead at all. Adding just one token of lookahead to get LR(1)
vastly increases the parsing power. Very few grammars can be parsed with LR(0), but most
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 56
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
unambiguous CFGs can be parsed with LR(1). The drawback of adding the lookahead is that
the algorithm becomes somewhat more complex and the parsing table gets much, much
bigger. The full LR(1) parsing table for a typical programming language has many thousands
of states compared to the few hundred needed for LR(0). A compromise in the middle is
found in the two variants SLR(1) and LALR(1) which also use one token of lookahead but
employ techniques to keep the table as small as LR(0). SLR(k) is an improvement over
LR(0) but much weaker than full LR(k) in terms of the number of grammars for which it is
applicable. LALR(k) parses a larger set of languages than SLR(k) but not quite as many as
LR(k). LALR(1) is the method used by the yacc parser generator.
o Precedence parser
Simple precedence parser
Operator-precedence parser
Extended precedence parser
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 57
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
1 Explain the lexical analysis ? Which tool can be used to generate the lexical analyzer ?
Explain a bit about tool.
1 Explain various tasks performed during lexical analysis. Also explain the relevance of
Regular Expression in lexical analysis.
1 What is context free grammar. Write down CFG for for loop of 'C' language
1 What is symbol table ?
An essential function of compiler is to record the identifiers and the relevant information
about its attribute type, its scope and in case of procedure or the function, names, arguments,
return types. A symbol table is a table containing a record for each identifier with fields for
the attribute of the identifier . This table is used by all the steps of compiler to access the data
as well as report errors.
1 Generate parse tree for following sentences based on standard arithmetic CFG
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 58
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
a -b *c
a + b * c -d / ( e * f)
a + b *c -d + e -f /(g + h )
a + b * c / d + e -f
A /b + c * d + e -f
9*7+5-2
Use following grammar if not given :
E => E + T | E-T|T
T => T * V | T/V| V
V => <id> | (E)
a-b*c
First of all find a start prediction. ( Always start from lower to higher precedence in the input
string).
E => E - T
=> T-T
=> V-T
=> <id> - T
=> <id> - T*V
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 59
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 60
TRAJECTORY EDUCATION
UGCNET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 61