You are on page 1of 97

PROGRAMMING

LANGUAGES
ECEg - 4182
by
Dr. T.R.SRINIVASAN, B.E., M.E., Ph.D.,
Assistant Professor,
Faculty of Electrical and Computer Engineering,
Jimma Institute of Technology,
Jimma University,
Jimma, Ethiopia

1
Defining Program Syntax and
semantics

Chapter 2
Defining a Programming Language
• Defining a programming language requires
specifying its syntax and its semantics.
• Syntax:
• The form or structure of the expressions, statements,
and program units.
• Example: if (<exp>) then <statement>
• Semantics:
• The meaning of the expressions, statements, and
program units.
• Example: if the value of <exp> is non-zero, then
<statement> is executed otherwise omitted.

3
Syntax and Semantics
• There is universal agreement on how to
express syntax.

• BNF is the notation.

• Backus-Naur Form (BNF)


• Defined by John Backus and Peter Naur as a way
to characterize Algol syntax (it worked.)

4
Who needs language definitions?
• Other language designers
• To evaluate whether or not the language
requires changes before its initial
implementation and use.
• Programmers
• To understand how to use the language to solve
problems.
• Implementers
• To understand how to write a translator for the
language into machine code (compiler)

5
Language Sentences
• A sentence is a string of characters over some alphabet.

• A language is a set of sentences.

• Syntax rules specify whether or not any particular


sentence is defined within the language.

• Syntax rules do not guarantee that the sentence “makes


sense”!

6
Recognizers vs. Generators

• Syntax rules can be used for two purposes:

• Recognizers:
• Accept a sentence, and return true if the sentence is in the
language.
• Similar to syntactic analysis phase of compilers.

• Generators
• Push a button, and out pops a legal sentence in the language.

7
Definition of a BNF Grammar

• BNF Grammars have four parts:


• Terminals:
• the primitive tokens of the language ("a", "+", "begin",...)
• Non-terminals:
• Enclosed in "<" and ">", such as <prog>
• Production rules:
• A single non-terminal, followed by
• "->", followed by
• a sequence of terminals and non-terminals.
• The Start symbol:
• A distinguished nonterminal representing the “root” of the
language.

8
Definition of a BNF Grammar
• A set of terminal symbols
• Example: { "a" "b" "c" "(" ")" "," }
• A set of non-terminal symbols
• Example: { <prog> <stmt> }
• A set of productions
• Syntax: A single non-terminal, followed by a
"->", followed by a sequence of terminals and non-
terminals.
• Example: <prog> -> "begin" <stmt_list> "end"
• A distinguished non-terminal, the Start Symbol
• Example: <prog>

9
Example BNF Grammar
• Productions
• <prog> -> "begin" <stmt_list> "end"
• <stmt_list> -> <stmt>
• <stmt_list> -> <stmt> ";" <stmt_list>
• <stmt> -> <var> ":=" <exp>
• <var> -> "a"
• <var> -> "b"
• <var> -> "c"
• <exp> -> <var> "+" <var>
• <exp> -> <var> "-" <var>
• <exp> -> <var>

10
Extended BNF
• EBNF extends BNF syntax to make grammars more
readable.
• EBNF does not make BNF more expressive, it’s a
short-hand.
• Sequence :
• <if> -> "if "<test> "then" <stmt>
• Optional – [ ] :
• <if> -> "if "<test> "then" <stmt> ["else" <stmt>]
• Alternative – | :
• <number> -> <integer> | <real>
• Group – ( ) :
• <exp> -> <var> | ( <var> "+" <var> )
• Repetition – { }:
• <ident_list> -> <ident> { "," <ident>}
11
Generation

• A grammar can be used to generate a sentence:


• Choose a production with the start symbol as its LHS
(left-hand side).
• Write down the RHS as the sentence-to-be.
• For each non-terminal in the sentence-to-be:
• Choose a production with this non-terminal as its LHS
• Substitute the production’s RHS for the non-terminal
• Keep going until only terminal symbols remain. The
result is a legal sentence in the grammar.

12
Example sentence generation
• begin <stmt_list> end
• begin <stmt> end
• begin <var> := <exp> end
• begin b := <exp> end
• begin b := <var> end
• begin b := c end

• Sentence generation is also known as “derivation”

• Derivationcan be represented graphically as a


“parse tree”.
13
Example Parse Tree
• <prog>

begin <stmt_list> end

<stmt>

<var> := <exp>

<var>
b

c
14
Recognition
• Grammar can also be used to test if a sentence is in
the language. This is “recognition”.

• One form of recognizer is a “parser”, which


constructs a parse tree for a given input string.

• Programs exist that automatically construct a parser


given a grammar (example: yacc)
• Not all grammars are suitable for yacc.

• Depending on the grammar, parsers can be either


“top-down” or “bottom-up”.
15
Attribute Grammars: we stop
here
• Context-free grammars (CFGs) cannot describe
all of the syntax of programming languages
• Additions to CFGs to carry some semantic info
along parse trees
• Primary value of attribute grammars (AGs)
• Static semantics specification
• Compiler design (static semantics checking)

16
Attribute Grammars: Definition

• Attribute Grammar is a generalization of a


context-free grammar in which:
• Each grammar symbol is associated with a set of
attributes.

• This set of attributes for a grammar symbol is partitioned


in to two subsets called synthesized and inherited
attributes of that grammar symbol.

• Each production rule is associated with a set of semantic


rules.
17
Attribute Grammars: Definition

• An attribute can represent anything we choose:


• a string, a number, a type, a memory location,
intermediate program representation etc...
• The value of a synthesized attribute is computed
from the values of attributes at the children of that
node in the parse tree.

• The value of an inherited attribute is computed from


the values of attributes at the siblings and parent of
that node in the parse tree.
18
19
• Semantic rules set up dependencies between
attributes which can be represented by a
dependency graph.
• This dependency graph determines the
evaluation order of these semantic rules.
• Evaluation of a semantic rule defines the
value of an attribute.

20
Annotated Parse Tree

• A parse tree showing the values of attributes at


each node is called an annotated parse tree.
• The process of computing the attributes values
at the nodes is called annotating (or decorating)
of the parse tree.
• Of course, the order of these computations
depends on the dependency graph induced by
the semantic rules.

21
Basic Idea of Attribute Grammars
• Take a BNF parse tree and add values to
nodes.

• Pass values up and down tree to communicate


syntax information from one place to another.

• Attach “semantic rules” to each production


rule that describe constraints to be satisfied.

22
Attribute Grammar Example
• This is not a “real” example.

• BNF:
• <proc> -> procedure <proc_name>
<proc_body>
end <end_name>;
• Semantic rule:
• <proc_name>.string = <end_name>. string
• Attributes:
• A “string” attribute value is computed and attached to
<proc_name> and <end_name> during parsing.

23
Syntax And Semantics

• Programming language syntax: how


programs look, their form and structure
• Syntax is defined using a kind of formal
grammar
• Programming language semantics: what
programs do, their behavior and meaning

24
An English Grammar
A sentence is a noun <S> ::= <NP> <V> <NP>
phrase, a verb, and a
noun phrase.

A noun phrase is an <NP> ::= <A> <N>


article and a noun.

A verb is… <V> ::= loves | hates|eats

An article is… <A> ::= a | the

A noun is... <N> ::= dog | cat | rat

25
How The Grammar Works
• The grammar is a set of rules that say how
to build a tree—a parse tree
• You put <S> at the root of the tree
• The grammar’s rules say how children can
be added at any point in the tree
• For instance, the rule
<S> ::= <NP> <V> <NP>
says you can add nodes <NP>, <V>, and
<NP>, in that order, as children of <S>
26
A Parse Tree
<S>

<NP> <V> <NP>

<A> <N> loves <A> <N>

the dog the cat

27
A Programming Language Grammar

<exp> ::= <exp> + <exp> | <exp> * <exp> | ( <exp> )


| a | b | c

• An expression can be the sum of two


expressions, or the product of two expressions,
or a parenthesized subexpression
• Or it can be one of the variables a, b or c

28
A Parse Tree
<exp>

( <exp> )
((a+b)*c)
<exp> * <exp>
( <exp> ) c

<exp> + <exp>
a b

29
Sample Grammar

• Language: Parenthesized sums of 0’s


and 1’s

• <Sum> ::= 0
• <Sum >::= 1
• <Sum> ::= <Sum> + <Sum>
• <Sum> ::= (<Sum>)

30
Sample Grammar
• Terminals: 0 1 + ( )
• Non-terminals: <Sum>
• Start symbol = <Sum>
<Sum> ::= 0
<Sum >::= 1
<Sum> ::= <Sum> + <Sum>
<Sum> ::= (<Sum>)
Can be abbreviated as (using EBNF)
<Sum> ::= 0 | 1 | <Sum> + <Sum> | (<Sum>)
31
BNF Derivations

• Generate ( 0 + 1 ) + 0

• Start with the start symbol:

<Sum> =>

32
BNF Derivations

• Pick a non-terminal

<Sum> =>

33
BNF Derivations

•Pick a rule and substitute:


• <Sum> ::= <Sum> + <Sum>

<Sum> => <Sum> + <Sum >

34
BNF Derivations
• Pick a non-terminal:

<Sum> => <Sum> + <Sum >

35
BNF Derivations
•Pick a rule and substitute:
• <Sum> ::= ( <Sum> )

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>

36
BNF Derivations
• Pick a non-terminal:

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>

37
BNF Derivations
•Pick a rule and substitute:
• <Sum> ::= <Sum> + <Sum>

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>
=> ( <Sum> + <Sum> ) + <Sum>

38
BNF Derivations

• Pick a non-terminal:

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>
=> ( <Sum> + <Sum> ) + <Sum>

39
BNF Derivations
•Pick a rule and substitute:
• <Sum >::= 1

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>
=> ( <Sum> + <Sum> ) + <Sum>
=> ( <Sum> + 1 ) + <Sum>

40
BNF Derivations
• Pick a non-terminal:

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>
=> ( <Sum> + <Sum> ) + <Sum>
=> ( <Sum> + 1 ) + <Sum>

41
BNF Derivations
•Pick a rule and substitute:
• <Sum >::= 0

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>
=> ( <Sum> + <Sum> ) + <Sum>
=> ( <Sum> + 1 ) + <Sum>
=> ( <Sum> + 1 ) + 0

42
BNF Derivations
• Pick a non-terminal:

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>
=> ( <Sum> + <Sum> ) + <Sum>
=> ( <Sum> + 1 ) + <Sum>
=> ( <Sum> + 1 ) + 0

43
BNF Derivations
•Pick a rule and substitute
• <Sum> ::= 0

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>
=> ( <Sum> + <Sum> ) + <Sum>
=> ( <Sum> + 1 ) + <Sum>
=> ( <Sum> + 1 ) + 0
=> ( 0 + 1 ) + 0
44
BNF Derivations
• ( 0 + 1 ) + 0 is generated by grammar

<Sum> => <Sum> + <Sum >


=> ( <Sum> ) + <Sum>
=> ( <Sum> + <Sum> ) + <Sum>
=> ( <Sum> + 1 ) + <Sum>
=> ( <Sum> + 1 ) + 0
=> ( 0 + 1 ) + 0
45
BNF Grammar Definition
• A BNF grammar consists of four parts:

• The set of tokens


• The set of non-terminal symbols
• The start symbol
• The set of productions

46
start symbol
<S> ::= <NP> <V> <NP>

a production
<NP> ::= <A> <N>

<V> ::= loves | hates|eats

<A> ::= a | the


non-terminal
symbols <N> ::= dog | cat | rat

tokens
47
Example
<exp> ::= <exp> + <exp> | <exp> * <exp> | ( <exp> )
| a | b | c

Note that there are six productions in this grammar.


It is equivalent to this one:

<exp> ::= <exp> + <exp>


<exp> ::= <exp> * <exp>
<exp> ::= ( <exp> )
<exp> ::= a
<exp> ::= b
<exp> ::= c
48
Empty

• The special non-terminal <empty> is for


places where you want the grammar to
generate nothing
• For example, this grammar defines a typical
if-then construct with an optional else part:

<if-stmt> ::= if <expr> then <stmt> <else-part>


<else-part> ::= else <stmt> | <empty>

49
Parse Trees

• To build a parse tree, put the start symbol at the


root
• Add children to every non-terminal, following
any one of the productions for that non-terminal
in the grammar
• Done when all the leaves are tokens
• Read off leaves from left to right—that is the
string derived by the tree
50
Compiler Note

• What we just did is parsing: trying to find a


parse tree for a given string
• That’s what compilers do for every program
you try to compile: try to build a parse tree for
your program, using the grammar for whatever
language you used

51
Language Definition
• We use grammars to define the syntax of
programming languages
• The language defined by a grammar is the
set of all strings that can be derived by
some parse tree for the grammar
• As in the previous example, that set is often
infinite
• Constructing grammars is a little like
programming...

52
Constructing Grammars
• Most important trick: divide and conquer
• Example: the language of Java declarations:
a type name, a list of variables separated by
commas, and a semicolon
• Each variable can be followed by an
initializer:
float a;
boolean a,b,c;
int a=1, b, c=1+2;

53
Example, Continued

• Easy if we postpone defining the comma-


separated list of variables with initializers:
<var-dec> ::= <type-name> <declarator-list> ;
• Primitive type names are easy enough too:
<type-name> ::= boolean | byte | short | int
| long | char | float | double

54
Example, Continued

• That leaves the comma-separated list of


variables with initializers
• Again, postpone defining variables with
initializers, and just do the comma-
separated list part:

<declarator-list> ::= <declarator>


| <declarator> , <declarator-list>

55
Example, Continued

• That leaves the variables with initializers:


<declarator> ::= <variable-name>
| <variable-name> = <expr>

• And definitions for <variable-name> and <expr>

56
BNF Semantics

• The meaning of a BNF grammar is


the set of all strings consisting only
of terminals that can be derived
from the Start symbol

57
Remember Parse Trees

• Graphical representation of a derivation


• Each node labeled with either a non-terminal
or a terminal
• If node is labeled with a terminal, then it is a
leaf (no sub-trees)
• If node is labeled with a terminal, then it has
one branch for each character in the right-
hand side of rule used to substitute for it

58
Example
• Consider grammar:
<exp> ::= <factor>
| <factor> + <factor>
<factor> ::= <bin>
| <bin> * <exp>
<bin> ::= 0 | 1

• Build parse tree for 1 * 1 + 0 as an <exp>

59
Example cont.
• 1 * 1 + 0: <exp>

<exp> is the start symbol for this parse


tree

60
Example cont.
• 1 * 1 + 0: <exp>
<factor>

Use rule: <exp> ::= <factor>

61
Example cont.
• 1 * 1 + 0: <exp>
<factor>
<bin> * <exp>

Use rule: <factor> ::= <bin> * <exp>

62
Example cont.
• 1 * 1 + 0: <exp>
<factor>
<bin> * <exp>
1 <factor> + <factor>

Use rules: <bin> ::= 1 and


<exp> ::= <factor> + <factor>
63
Example cont.
•• 1 * 1 + 0: <exp>
<factor>
<bin> * <exp>
1 <factor> + <factor>
<bin> <bin>

Use rule: <factor> ::= <bin>


64
Example cont.
• 1 * 1 + 0: <exp>
<factor>
<bin> * <exp>
1 <factor> + <factor>
<bin> <bin>
1 0
Use rules: <bin> ::= 1 | 0
65
Example cont.
• 1 * 1 + 0: <exp>
<factor>
<bin> * <exp>
1 <factor> + <factor>
<bin> <bin>
1 0
string generated by grammar
66
Where Syntax Meets Semantics

67
Three “Equivalent” Grammars
G1: <subexp> ::= a | b | c | <subexp> - <subexp>

G2: <subexp> ::= <var> - <subexp> | <var>


<var> ::= a | b | c

G3: <subexp> ::= <subexp> - <var> | <var>


<var> ::= a | b | c

These grammars all define the same language: the


language of strings that contain one or more as, bs
or cs separated by minus signs.

68
<subexp>

<var> - <subexp>

G2 parse tree: a <var> - <subexp>

b <var>

<subexp>

<subexp> - <var>

G3 parse tree: c
<subexp> - <var>

<var> b

69
Why Parse Trees Matter
• We want the structure of the parse tree to
correspond to the semantics of the string it
generates
• This makes grammar design much harder: we’re
interested in the structure of each parse tree, not
just in the generated string
• Parse trees are where syntax meets semantics

70
Outline

• Operators
• Precedence
• Associativity
• Ambiguities

71
Operators

• Special syntax for frequently-used simple


operations like addition, subtraction,
multiplication and division
• The word operator refers both to the token used to
specify the operation (like + and *) and to the
operation itself
• Usually predefined, but not always
• Usually a single token, but not always

72
Operator Terminology

• Operands are the inputs to an operator, like 1


and 2 in the expression 1+2
• Unary operators take one operand: -1
• Binary operators take two: 1+2
• Ternary operators take three: a?b:c

73
More Operator Terminology

• In most programming languages, binary operators


use an infix notation: a + b
• Sometimes you see prefix notation: + a b
• Sometimes postfix notation: a b +
• Unary operators, similarly:
• (Can’t be infix, of course)
• Can be prefix, as in --1
• Can be postfix, as in a++

74
Working Grammar

G4: <exp> ::= <exp> + <exp>


| <exp> * <exp>
| (<exp>)
| a | b | c

This generates a language of arithmetic expressions


using parentheses, the operators + and *, and the
variables a, b and c

75
Issue #1: Precedence
<exp>

<exp> * <exp>

<exp> + <exp> c

a b

Our grammar generates this tree for a+b*c. In this tree,


the addition is performed before the multiplication,
which is not the usual convention for operator precedence.

76
Operator Precedence

• Applies when the order of evaluation is not


completely decided by parentheses
• Each operator has a precedence level, and those
with higher precedence are performed before those
with lower precedence, as if parenthesized
• Most languages put * at a higher precedence level
than +, so that
a+b*c = a+(b*c)

77
Precedence In The Grammar
G4: <exp> ::= <exp> + <exp>
| <exp> * <exp>
| (<exp>)
| a | b | c

To fix the precedence problem, we modify the


grammar so that it is forced to put * below + in
the parse tree.

G5: <exp> ::= <exp> + <exp> | <mulexp>


<mulexp> ::= <mulexp> * <mulexp>
| (<exp>)
| a | b | c

78
Correct Precedence
<exp>

<exp> + <exp>

G5 parse tree: <mulexp> <mulexp>

a <mulexp> * <mulexp>

b c

Our new grammar generates this tree for a+b*c. It


generates the same language as before, but no longer
generates parse trees with incorrect precedence.

79
Issue #2: Associativity
<exp> <exp>

<exp> + <exp> <exp> + <exp>

<mulexp> <exp> + <exp> <exp> + <exp> <mulexp>

a <mulexp> <mulexp> <mulexp> <mulexp> c

b c a b

Our grammar G5 generates both these trees for


a+b+c. The first one is not the usual convention for
operator associativity.

80
Operator Associativity

• Applies when the order of evaluation is not decided


by parentheses or by precedence
• Left-associative operators group left to right: a+b+c+d
= ((a+b)+c)+d
• Right-associative operators group right to left: a+b+c+d
= a+(b+(c+d))
• Most operators in most languages are left-associative,
but there are exceptions

81
Associativity Examples
• C

• ML

• Fortran

a<<b<<c — most operators are left-associative


a=b=0 — right-associative (assignment)

3-2-1 — most operators are left-associative


1::2::nil — right-associative (list builder)

a/b*c — most operators are left-associative


a**b**c — right-associative (exponentiation)
82
Associativity In The Grammar
G5: <exp> ::= <exp> + <exp> | <mulexp>
<mulexp> ::= <mulexp> * <mulexp>
| (<exp>)
| a | b | c
To fix the associativity problem, we modify the grammar to
make trees of +s grow down to the left (and likewise for *s)

G6: <exp> ::= <exp> + <mulexp> | <mulexp>


<mulexp> ::= <mulexp> * <rootexp> | <rootexp>
<rootexp> ::= (<exp>)
| a | b | c

83
Correct Associativity
<exp>

<exp> + <mulexp>

<exp> + <mulexp> <rootexp>

<mulexp> <rootexp> c

<rootexp> b

Our new grammar generates this tree for a+b+c. It


generates the same language as before, but no longer
generates trees with incorrect associativity.
84
Ambiguous Grammars and
Languages

• A BNF grammar is ambiguous if its


language contains strings for which there
is more than one parse tree
• If all BNF’s for a language are ambiguous
then the language is inherently
ambiguous

85
Example: Ambiguous Grammar
• 0+1+0
<Sum> <Sum>
<Sum> + <Sum> <Sum> + <Sum>
<Sum> + <Sum> 0 0 <Sum> + <Sum>
0 1 1 0

86
Two Major Sources of
Ambiguity
• Lack of determination of operator
precedence
• Lack of determination of operator
associatively

• Not the only sources of ambiguity

87
Issue #3: Ambiguity
• G4 was ambiguous: it generated more than one
parse tree for the same string
• Fixing the associativity and precedence problems
eliminated all the ambiguity
• This is usually a good thing: the parse tree
corresponds to the meaning of the program, and
we don’t want ambiguity about that
• Not all ambiguity stems from confusion about
precedence and associativity...

88
Dangling Else In Grammars
<stmt> ::= <if-stmt> | s1 | s2
<if-stmt> ::= if <expr> then <stmt> else <stmt>
| if <expr> then <stmt>
<expr> ::= e1 | e2

This grammar has a classic “dangling-else ambiguity.” The


statement we want derived is

if e1 then if e2 then s1 else s2

and the next slide shows two different parse trees for it...

89
<if-stmt>

if <exp> then <stmt> else <stmt>

e1 <if-stmt> s2

if <exp> then <stmt>

e2 s1

<if-stmt>

if <exp> then <stmt>


Most languages that have
this problem choose this
e1 <if-stmt> parse tree: else goes with
nearest unmatched then
if <exp> then <stmt> else <stmt>

e2 s1 s2
Eliminating The Ambiguity
<stmt> ::= <if-stmt> | s1 | s2
<if-stmt> ::= if <expr> then <stmt> else <stmt>
| if <expr> then <stmt>
<expr> ::= e1 | e2

We want to insist that if this expands into an if, that if must


already have its own else. First, we make a new non-terminal
<full-stmt> that generates everything <stmt> generates, except
that it can’t generate if statements with no else:

<full-stmt> ::= <full-if> | s1 | s2


<full-if> ::= if <expr> then <full-stmt> else <full-stmt>

91
Eliminating The Ambiguity
<stmt> ::= <if-stmt> | s1 | s2
<if-stmt> ::= if <expr> then <full-stmt> else <stmt>
| if <expr> then <stmt>
<expr> ::= e1 | e2

Then we use the new non-terminal here.

The effect is that the new grammar can match an else part
with an if part only if all the nearer if parts are already
matched.

92
Correct Parse Tree
<if-stmt>

if <exp> then <stmt>

e1 <if-stmt>

if <exp> then <full-stmt> else <stmt>

e2 s1 s2

93
Conclusion
• Grammars define syntax, and more
• They define not just a set of legal programs, but a
parse tree for each program
• The structure of a parse tree corresponds to the
order in which different parts of the program are
to be executed
• Therefore, grammars contribute to the definition
of semantics

94
End of Syntax and Semantics

95
Quiz
• Try to figure out what kind of problem is found in
this grammar and try to fix it
<Sum> ::= 0
<Sum >::= 1
<Sum> ::= <Sum> + <Sum>
<Sum> ::= (<Sum>)
Can be abbreviated as (using EBNF)
<Sum> ::= 0 | 1 | <Sum> + <Sum> | (<Sum>)

96
Solution

• The problem is associativity problem


• How to fix it?

<Sum> ::= <Sum> + <Sumterm> | <Sumterm>


<Sumterm> ::= (<Sum>) | 0 | 1

97

You might also like