You are on page 1of 40

LESSON 04

ALL the ROUGH pages included from


lesson 1,2,3,4,5… are not included in the
paper.
Overview
of
Previous Lesson(s)
Over View…
 Decomposition
of a compiler.

Symbol Table

3
Over View..
 Language can also be classified using generations as well.

 1st generation programming language (1GL)


 Architecture specific binary delivered on Switches, Patch
Panels and/or Tape.

 2nd generation programming language (2GL)


 Most commonly use in RISC, CISC and x86 as that is what our
embedded systems and desktop computers use.

4
Over View...
 3rd generation programming language (3GL)
 C, C++, C#, Java, Basic, COBOL, Lisp and ML.

 4th generation programming language (4GL)


 SQL, SAS, R, MATLAB's GUIDE, ColdFusion, CSS. 

 5th generation programming language (5GL)


 Prolog, Mercury. 

5
Over View...
 Modeling in Compiler Design

 Compiler design is one of the places where theory has had the most
impact on practice.

 Models that have been found useful include automata, grammars,


regular expressions, trees, and many others.

6
Over View…
 Optimization is to produce code that is more efficient than the
obvious code.

 Compiler optimizations must meet the following design objectives:

 The optimization must be correct, that is, preserve the meaning of the
compiled program.
 The optimization must improve the performance of many programs.
 The compilation time must be kept reasonable.

7
TODAY’S LESSON

8
Contents
 Syntax Director Translator

 Introduction

 Syntax Definition
 Context Free Grammars
 Derivations
 Parse Trees
 Ambiguity
 Associativity of Operators
 Operator Precedence

9
Syntax Directed Translator
 This section illustrates the compiling techniques by developing a
program that translates representative programming language
statements into three-address code(Assembly language), an
intermediate representation.

 We will focus on
 Front end of a compiler
 Lexical analysis
 Parsing
 Intermediate code generation.

10
ROUGH
 Background : Parser uses a CFG(Context-free-Grammer) to
validate the input string and produce output for next phase of
the compiler.
 Output could be either a parse tree or abstract syntax tree.
Now to interleave semantic analysis with syntax analysis
phase of the compiler, we use Syntax Directed Translation.

 interleave =insert pages, typically blank ones, between the


pages of (a book).

11
Syntax Directed Translator..

Model of a Compiler Front End

12
Introduction
 Analysis is organized around the "syntax" of the language to be
compiled.
 The syntax of a programming language describes the proper form of its
programs.
 The semantics of the language defines what its programs mean.

 For specifying syntax, Context-Free Grammars is used.


 Also known as BNF (Backus-Naur Form)

 We start with a syntax-directed translation of an infix expression to


postfix form.
 Infix form: 9 – 5 + 2 to Postfix form: 9 5 – 2 +

13
ROUGH
 Infix : an arithmetic expression such as B * C, In this case we
know that the variable B is being multiplied by the variable C
since the multiplication operator * appears between them in
the expression. This type of notation is referred to as infix since
the operator is in between the two operands that it is working
on.

 Postfix : Consider another infix example, A + B * C. The


operators + and * still appear between the operands, but there
is a problem. Which operands do they work on? Does the +
work on A and B or does the * take B and C? The expression
seems ambiguous.
14
ROUGH
 Let’s interpret the troublesome expression A + B * C using operator
precedence. B and C are multiplied first, and A is then added to that
result. (A + B) * C would force the addition of A and B to be done first
before the multiplication. In expression A + B + C, by precedence (via
associativity), the leftmost + would be done first.

 There are two other very important expression formats that may not
seem obvious to you at first. Consider the infix expression A + B. What
would happen if we moved the operator before the two operands? The
resulting expression would be + A B. Likewise, we could move the
operator to the end. We would get A B +. These look a bit strange.
 These changes to the position of the operator with respect to the
operands create two new expression formats, prefix and postfix. Prefix
expression notation requires that all operators precede the two
operands that they work on. Postfix, on the other hand, requires that
its operators come after the corresponding operands.
15
ROUGH
 In postfix, A + B * C would be written as A B C * +. the order
of operations is preserved since the * appears immediately
after the B and the C, denoting that * has precedence, with +
coming after. Although the operators moved and now appear
either before or after their respective operands, the order of
the operands stayed exactly the same relative to one another.

 Here stack is used..i.e. push pop

16
ROUGH

17
ROUGH
4+5*6  456*+

18
ROUGH

Infix Expression Prefix Expression Postfix Expression


A+B*C+D A*BCD++ +ABC*+D

)C + D( * )A + B( AB+CD+* *+AB+CD

A*B+C*D AB*CD*+ +*AB*CD

A+B+C+D ABCD+++ +AB+C+D

19
Syntax Definition
 Context Free Grammar is used to specify the syntax of the
language.
 Shortly we can say it “Grammar”.

 A grammar describes the hierarchical structure of most


programming language constructs.

 Ex.
if ( expression ) statement else statement

20
Syntax Definition..
 This rule can be expressed as production by using the variable expr
to denote an expression and the variable stmt to denote a
statement.
stmt -> if ( expr ) stmt else stmt

 In a production
 lexical elements like the keyword if, else and the parentheses are
called terminals.
 Variables like expr and stmt represent sequences of terminals and are
called nonterminals.

21
Grammars
 A context-free grammar has four components

 A set of tokens (terminal symbols)


 A set of nonterminals
 A set of productions
 A designated start symbol

 Lets check an example that elaborates these components.

22
Grammars..
 Expressions …
9–5+2, 5–4, 8…
 Since a plus or minus sign must appear between two digits, we refer
to such expressions as lists of digits separated by plus or minus signs.

 The productions are

List -> list + digit P-1


List -> list – digit P-2
List -> digit P-3
Digit -> 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 P-4

List may contain one or more digits

23
Grammars..
 Terminals
0,1,2,3,4,5,6,7,8,9

 Non-Terminals
list , digit

 Designated Start Symbol


list

24
Derivations
 Given a CF grammar we can determine the set of all strings
(sequences of tokens) generated by the grammar using derivation.

 We begin with the start symbol

 In each step, we replace one nonterminal in the current sentential


form with one of the right-hand sides of a production for that
nonterminal

25
Derivations..
 Derivation for our example expression.

list Start Symbol


 list + digit P-1
 list - digit + digit P-2
 digit - digit + digit P-3
 9 - digit + digit P-4
 9 - 5 + digit P-4
9-5+2 P-4

 This is an example of leftmost derivation, because we replaced


the leftmost nonterminal (underlined) in each step.

26
Parse Trees
 Parsing is the problem of taking a string of terminals and figuring
out how to derive it from the start symbol of the grammar.
 If it cannot be derived from the start symbol of the grammar, then
reporting syntax errors within the string.

 Given a context-free grammar, a parse tree according to the


grammar is a tree with the following properties:
 The root is labeled by the start symbol.
 Each leaf is labeled by a terminal or by ɛ.
 Each interior node is labeled by a nonterminal.
 If A  X1 X2 … Xn is a production, then node A has immediate children
X1, X2, …, Xn where Xi is a (non)terminal or .

27
Parse Trees..
Parse tree of the string 9-5+2 using grammar G
list

list digit

list digit

digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree

28
Tree Terminology
 A tree consists of one or more nodes.
 Exactly one is the root.

 If node N is the parent of node M, then M is a child of N.


 The children of one node are called siblings.
 They have an order, from the left.

 A node with no children is called a leaf.


 A descendant of a node N is either N itself, a child of N, a child of a
child of N, and so on.

29
Ambiguity
 A grammar can have more than one parse tree generating a given
string of terminals.
 Such a grammar is said to be ambiguous.

 To show that a grammar is ambiguous, all we need to do is find a


terminal string that is the yield of more than one parse tree.

30
Ambiguity..
 Consider the Grammar
G = [ {string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string ]

 Its productions are


string  string + string | string - string | 0 | 1 | … | 9

 This grammar is ambiguous, because more than one parse tree


represents the string 9-5+2

31
Ambiguity…

string string

string string string

string string string string string

9 - 5 + 2 9 - 5 + 2

Two Parse Trees for 9 – 5 + 2

32
Associativity of Operators
 Left-associative operators have left-recursive productions
 For instance
list  list – digit | digit
String 9-5-2 has the same meaning as (9-5)-2

 Right-associative operators have right-recursive productions


 For Instance see the grammar below
right  letter = right | letter
String a=b=c has the same meaning as a=(b=c)

33
Associativity of Operators..

34
Operator Precedence
 Consider the expression 9+5*2.

 There are two possible interpretations of this expression:


(9+5 ) *2 or 9+ ( 5*2)

 The associativity rules for + and * apply to occurrences of the same


operator, so they do not resolve this ambiguity.

 A grammar for arithmetic expressions can be constructed from a


table showing the associativity and precedence of operators.

35
Operator Precedence..
 Lets see an example of four common arithmetic operators and a
precedence table, showing the operators in order of increasing
precedence.
left-associative: + -
left-associative: */

 Now we create two nonterminals expr and term for the two levels of
precedence, and an extra nonterminal factor for generating basic units in
expressions.

 The basic units in expressions are presently digits and parenthesized


expressions.

factor -> digit I ( expr )

36
Operator Precedence..
 Now consider the binary operators, * and /, that have the highest
precedence and left associativity.
term - > term * factor | term / factor | factor
 Similarly, expr generates lists of terms separated by the additive
operators.
expr -> expr + term I expr – term I term

 Final grammar is
expr -> expr + term I expr – term I term
term - > term * factor | term / factor | factor
factor -> digit I ( expr )

37
Operator Precedence..
 Ex. String 2+3*5 has the same meaning as 2+(3*5)

expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5

38
Associativity & Precedence Table

39
Thank You

You might also like