You are on page 1of 43

Lecture Seven

Lecture Outline
▪ Overview of Bottom-Up Parsing
▪ Finite Automata of LR(0) Items and LR(0) Parsing
▪ SLR(1) Parsing General LR(1) and LALR(1) Parsing
▪ Yacc: An LALR(1) Parser Generator Generation
▪ Error Recovery in Bottom-Up Parsers

2
Bottom-Up Parsing
▪ Bottom-up parsing algorithms are in general more powerful than top-down
methods. (For example, left recursion is not a problem in bottom-up parsing.)
▪ Almost all practical programming languages have an LR(1) grammar.
▪ All of the important bottom-up methods are really too complex for hand
coding.
▪ Begin at the leaves, build the parse tree in small segments, combine the small
trees to make bigger trees, until the root is reached.
▪ This process is called reduction of the sentence to the start symbol of the
grammar

3
Bottom-Up Parsing
▪ The most general bottom-up algorithm is called LR(1) parsing:
▪ the L indicates that the input is processed from left to right.
▪ the R indicates that a rightmost derivation is produced.
▪ the number 1 indicates that one symbol of lookahead is used.
▪ LR(0) parsing: no lookahead is need in making parsing decisions. (This is
possible because a lookahead token can be examined after it appears on the
parsing stack, and if this happens it does not count as lookahead.)
▪ SLR(1) (for simple LR(1) parsing ) : An improvement on LR(0) parsing.
▪ LALR(1) parsing (for lookahead LR(1) parsing):A method that is slightly
more powerful than SLR(1) parsing but less complex than general LR(1)
parsing. 4
Bottom-Up Parsing
▪ A bottom-up parser uses an explicit stack to perform a parse similar to a
nonrecursive top-down parser.
▪ The parsing stack will contain both tokens and nonterminals, and also some
extra state information.
▪ The stack is empty at the beginning of a bottom-up parse and will contain the
start symbol at the end of a successful parse
▪ A bottom-up parser has two possible actions (besides "accept"):
1. Shift a terminal from the front of the input to the top of the stack.
2. Reduce a string a at the top of the stack 𝛼 to a nonterminal 𝐴. Given the
Grammar Rule 𝐴 → 𝛼
A bottom-up parser is thus sometimes called a shift-reduce parser.
5
LR
Parsers

LR(0) SLR(1) LALR(1)


LR(0) items LR(0) items LR(1) items
6
Bottom-Up Parsing Example(1)

▪ Grammars are always augmented with a new start symbol. This means that if
S is the start symbol, a new start symbol S' is added to the grammar, with a
single unit production to the previous start symbol.

7
Bottom-Up Parsing Example(2)

8
Bottom-Up Parsing
▪ , a bottom-up parser can shift input symbols onto the stack until it determines
what action to. However, a bottom-up parser may need to look deeper into the
stack than just the top in order to determine what action to perform. For
example, the previous table, line 5 has S on the top of the stack, and the parser
performs a reduction by the production S  ( S ) S, while line 6 also has S on
the top of the stack, but the parser performs a reduction by S’S.
▪ To be able to know that S  ( S ) S is a valid reduction at step 5, we must
know that the stack actually does contain the string ( S ) S at that point. Thus,
bottom-up parsing requires arbitrary "stack lookahead.“
▪ 'Ihis is not nearly as serious as input lookahead, since the parser itself builds
the stack and can arrange for the appropriate information to be available. The
mechanism that will do this is a deterministic finite automaton of "items"
9
LR(0)items
▪ An LR(0) item of a context-free grammar is a production choice with a
distinguished position in its right-hand side. We will indicate this
distinguished position by a period (which, of course, becomes a meta symbol,
not to be confused with an actual token).
▪ if 𝐴 →∝ is a production choice, and if 𝛽 and 𝛾 are any two strings of symbols
(including the empty string 𝜀) such that 𝛽𝛾 = 𝛼, then 𝐴 → 𝛽. 𝛾 is an LR(0)
item. These are called LR(0) items because they contain no explicit reference
to lookahead.

10
LR(0) items

11
LR(0) items
▪ The idea behind the concept of an item is that an item records an intermediate
step in the recognition of the right-hand side of a particular grammar rule
choice.
▪ In the grammar rule choice 𝐴 →∝ 𝑤𝑖𝑡ℎ 𝛼 = 𝛽𝛾.
𝐴 → 𝛽. 𝛾
means that 𝛽 has already been seen and that it may be possible to derive the
next input tokens from 𝛾
▪ In terms of the parsing stack, this means that 𝛽 must appear at the top of the
stack.

12
Finite Automata of LR(0) items and LR(0) Parsing
▪ The LR(0) items can be used as the states of a finite automaton that maintains
information about the parsing stack and the progress of a shift-reduce parse.
This will start out as a nondeterministic finite automaton. From this NFA of
LR(0) items we can construct the DFA of sets of LR(0) items using the subset
construction algorithm. It is also easy to construct the DFA of sets of LR(O)
items directly.

13
Finite Automata of LR(0) items and LR(0) Parsing
▪ DFA

14
Finite Automata of LR(0) items and LR(0) Parsing
▪ Closure Items: items are added to a state during the e-closure step.
▪ kernel items : items are originated from the state of non-𝜀-transitions.
▪ All closure items are initial items
▪ The kernel items uniquely determine the state and its transitions. Thus, only
kernel items need to be specified to completely characterize the DFA of sets of
items. Parser generators that construct the DFA may, therefore, only report the
kernel items (this is true of Yacc, for instance).

15
LR(0)Grammar

16
LR(0)Example DFA

17
LR(0)Example(2)

18
LR(0)-Parsing table

19
Lecture Nine

Shahira Azazy 20
Shahira Azazy 21
CS510 Spring 2015
Errors
 Syntactic errors: violet grammar rules and caught by
compliers.
 Static Semantic errors : e.g. identifiers are not declared
caught by compliers.
 Runtime Errors e.g. division by zero.
 Semantic errors: meaning may not be different from
programmer’s intension.
▪ Crashes (stops running)
▪ Runs forever
▪ Produces an answer but not the desired one.

Shahira Azazy 22
 CS510 Spring 2015
Semantic Analysis
 Parsing cannot catch some errors:
e.g. :
▪ Multiple declarations: a variable should be declared (in the same
scope) at most once.
▪ Undeclared variable: a variable should not be used without being
declared.
▪ Type mismatch: e.g., type of the left-hand side of an assignment
should match the type of the right-hand side. y=y+3 error (string
+number)
▪ Wrong arguments: methods should be called with the right number
and types of arguments.
▪ Classes defined only once.
▪ Methods in a class defined only once.
Shahira Azazy 23
CS510 Spring 2015
Attribute Grammars
• Regular expressions used for scanner phase.
• Context-free grammars used for parser phase.
• Attribute grammars method of describing semantic analysis.
• An attribute is any property of a programming language construct
• The data type of a variable
• The value of an expression
• The location of a variable in memory
• Attributes are associated with the grammar symbols of the language. If X
is a grammar symbol, and a is an attribute associated to X, then we write
X.a for the value of a associated to X.

Shahira Azazy 24
CS510 Spring 2015
Attribute Grammars
 Example :
num → nm digit |digit
digit → 0|1|2|3|4|5|6|7|8|9
 Grammar Rule: num→ digit
 Semantic Rule: num. val= digit.val

 Grammar Rule : →num digit


The number on the right will have a different value from
that of the number on the left
 Semantic Rule: numb1 →num2 digit
Shahira Azazy 25
CS510 Spring 2015
Attribute Grammars
GRAMMAR RULE Semantic Rules parse tree for the number 321
num1 →num2 digit num1 .val = num2.val * 10 +
num
digit.val
(val=32*10 + 1 = 321)
num →digit num.val = digit.val
digit→ 0 digit.val = 0
num digit
digit→ 1 digit.val = 1
(val=3*10 + 2 = 32) (val=1)
digit→ 2 digit.val = 2
digit→ 3 digit.val = 3
num
digit→ 4 digit.val = 4 (val=3) digit
(val=2)
digit→ 5 digit.val = 5
digit→ 6 digit.val = 6 digit
digit→ 7 digit.val = 7 (val=3)
digit→ 8 digit.val = 8
digit→ 9 digit.val = 9 3 2 1

Shahira Azazy 26
CS510 Spring 2015
Attribute Grammars
The computation of attributes is described using
equations or semantic rule.
There are two types of attributes:
• Synthesized attributes
Values computed from children
• Inherited attributes
Values computed from parent and siblings

Shahira Azazy 27
CS510 Spring 2015
Shahira Azazy 28
CS510 Spring 2015
Runtime Organization
When a program is invoked:
 The OS allocates space for the program
 The code is loaded into part of the space
 The OS jumps to the entry point (i.e., “main”)

Compiler is responsible for:


–Generating code
–Orchestrating use of the data area
Shahira Azazy 29
CS510 Spring 2015
Runtime Organization
 Code generation is dependent on the details of the
target machine.
 The memory of a typical computer is divided into a
register area and a slower directly addressable
random access memory (RAM).
 The RAM area may be further divided into a code
area and a data area
 The code area contains code
–fixed size and read only for most Languages
 The static area contains data with fixed addresses (e.g.,
global data).
 The stack contains an (Activation Record)AR for each
currently active procedure.
 Heap contains all other data.

Shahira Azazy 30
CS510 Spring 2015
Fully static runtime environments
 The simplest kind of a runtime.
e.g. FORTRAN77.
 All data are static fixed in
memory for the duration of
program execution.
 No pointers, no dynamic
allocation and no recursive
functions.
 Each procedure has only a single
activation record, which is
allocated statically prior to
Shahira Azazy
execution. 31
CS510 Spring 2015
STACK-BASED RUNTIME ENVIRONMENTS

 In languages with recursive calls and in which local


variables are newly allocated at each call, activation
records cannot be allocated statically.
 The stack of activation records (also referred to as the
runtime stack or call stack) grows and shrinks with the
different calls.

Shahira Azazy 32
CS510 Spring 2015
Example
1. int x=15;
2. Int y=10;
3. int gcd( int u, int v){
4. if (v == 0) return u;
5. else return gcd(v,u % v);}
6. int main(){
7. printf(“%d”,gcd(x,y));
8. return 0;}

Shahira Azazy 33
CS510 Spring 2015
STACK-BASED RUNTIME ENVIRONMENTS
 when the printf statement is executed activation record for main
and the global/static area main in the environment.
 The pointer to the current activation is usually called the frame
pointer, or fp, and is usually kept in a register (often also
referred to as the fp).
 The information about the previous activation is commonly kept
in the current activation as a pointer to the previous activation
record and is referred to as the control link
 stack pointer, or sp, which always points to the last location
allocated on the call stack (sometimes this is called the top of
stack pointer, or tos)

Shahira Azazy 34
CS510 Spring 2015
Shahira Azazy 35
CS510 Spring 2015
Code generation
 Code generation phase depends on :
• Target architecture.
• The structure of the runtime environment.
• Operating system of the target machine.

 In this Lecture we will study generate intermediate code


(universal form of assembly code that must be processed
further by an assembler)
 Intermediate code is relatively target machine independent.
 Two popular forms of intermediate code: three-address code
and P-code.
Shahira Azazy 36
 CS510 Spring 2015
Why Intermediate Code?
Source Languages Target Machines Source Languages Target Machines

L1 M1 L1 M1
L2 L2
M2 IC M2
L3 L3

L4 L4
M3 M3

L*M Code Generator L+M Code Generator

Shahira Azazy 37
CS510 Spring 2015
Why Intermediate Code?
 Generating machine code directly from source code.
With L languages and M target machines, L*M code
generators is needed.
 converting source code to an intermediate code (
machine-independent). With L languages and M target
machines, L+M code generators is needed.

Shahira Azazy 38
CS510 Spring 2015
Three-Address Code
X=y op z
2*a+(b-3)
The three-address code for 2*a+(b-3)
tl = 2 *a
T2=b-3
t3 = t1 + t2

Shahira Azazy 39
CS510 Spring 2015
Example
a+b*c-d/(b*e)
 t1 = b*c
 t2 = a+t1
 t3 = b*e
 t4 = d/t3
 t5 = t2-t4

Shahira Azazy 40
CS510 Spring 2015
Three-Address Code Instructions
If Statement
If (E) S
t1=E
If false t1 goto L1
code for S
L1:
Exit

Shahira Azazy 41
CS510 Spring 2015
Three-Address Code Instructions
If Statement
If (E) S1 else S2
t1=E
If false t1 goto L1
code for S1
goto L2
L1:
code for S2
L2:
Exit
Shahira Azazy 42
CS510 Spring 2015
Three-Address Code Instructions
While Statement
while (E) do S
L1:
t1=E
If false t1 goto L2
code for S
goto L1
L2:
Exit

Shahira Azazy 43
CS510 Spring 2015

You might also like