You are on page 1of 91

Lecture 3

Chapter 3
Describing Syntax & Semantics

Principles of Programming languages


TextBook

Principles of Programming
Languages, Robert W.
Sebesta, (10th edition),

Addison-Wesley Publishing
Company

- Principles of Programming languages


Topics
• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
• Attribute Grammars
• Describing the Meanings of Programs:
• Dynamic Semantics

- Principles of Programming languages


Introduction
• Syntax: the form or structure of the expressions,
statements, and program units
• Semantics: the meaning of the expressions,
statements, and program units
• Syntax and semantics provide a language’s definition
– Users of a language definition
• Other language designers
• Implementers
• Programmers (the users of the language)

- Principles of Programming languages


- Principles of Programming languages
The General Problem of Describing Syntax:
Terminology
A sentence is a string of
characters over some
alphabet
A language is a set of
sentences
A lexeme is the lowest
level syntactic unit of a
language (e.g., *, sum,
begin)
A token is a
category of lexemes
(e.g., identifier)
Formal Definition of Languages
• Recognizers
– A recognition device reads input strings over the alphabet
of the language and decides whether the input strings
belong to the language
– Example: syntax analysis part of a compiler
• Generators
– A device that generates sentences of a language
– One can determine if the syntax of a particular sentence is
syntactically correct by comparing it to the structure of the
generator

- Principles of Programming languages


BNF and Context-Free Grammars
• Context-Free Grammars
– Developed by Noam Chomsky in the mid-1950s
– Language generators, meant to describe the syntax of
natural languages
– Define a class of languages called context-free languages
• Backus-Naur Form (1959)
– Invented by John Backus to describe Algol 58
– BNF is equivalent to context-free grammars

- Principles of Programming languages


BNF Fundamentals
• In BNF, abstractions are used to represent classes of syntactic structures --
they act like syntactic variables (also called nonterminal symbols, or just
terminals)
• Terminals are lexemes or tokens
• A rule has a left-hand side (LHS), which is a non-terminal, and a right-hand
side (RHS), which is a string of terminals and/or non-terminals
• Non-terminals are often enclosed in angle brackets
– Examples of BNF rules:

<ident_list> → identifier | identifier, <ident_list>


<if_stmt> → if <logic_expr> then <stmt>

• Grammar: a finite non-empty set of rules


• A start symbol is a special element of the nonterminals of a grammar

- Principles of Programming languages


BNF Rules
• An abstraction (or nonterminal symbol) can
have more than one RHS

<stmt> → <single_stmt>
| begin <stmt_list> end

- Principles of Programming languages


Describing Lists
• Syntactic lists are described using recursion
<ident_list> → ident | ident, <ident_list>
• A derivation is a repeated application of rules,
starting with the start symbol and ending with
a sentence (all terminal symbols)

- Principles of Programming languages


An Example Grammar

- Principles of Programming languages


Derivation

- Principles of Programming languages


An Example Grammar & Derivation

<program> → <stmts> <program> => <stmts>


 <stmts> => <stmt>
<stmts> → <stmt> | <stmt> ; <stmts>
 <var> = <expr>
<stmt> → <var> = <expr>  a = <expr>
 a = <term> + <term>
<var> → a | b | c | d  a = <var> + <term>
 a = b + <term>
<expr> → <term> + <term> |
<term> - <term>  a = b + const

<term> → <var> | const

- Principles of Programming languages


Question

How will this grammar generate the following statement,


written by a programmer ?
Solution

is generated by
An Example Derivation
<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const

- Principles of Programming languages


Derivations
• Every string of symbols in a derivation is a
sentential form
• A sentence is a sentential form that has only
terminal symbols
• A leftmost derivation is one in which the
leftmost nonterminal in each sentential form
is the one that is expanded

- Principles of Programming languages


Parse Tree
• A hierarchical representation of a derivation

- Principles of Programming languages


Ambiguity in Grammars
• A grammar is ambiguous if and only if it generates a sentential
form that has two or more distinct parse trees
• some string that it can generate in more than one way
• semantic information is needed to select the intended parse
• For example, in C the following:
– x * y ; can be interpreted as either:
• the declaration of an identifier named y of type pointer-to-x, or
• an expression in which x is multiplied by y and then the result is
discarded.

- Principles of Programming languages


An Ambiguous Expression Grammar

- Principles of Programming languages


An Unambiguous Expression Grammar
• If we use the parse tree to indicate precedence levels of the operators, we
cannot have ambiguity.

- Principles of Programming languages


• Parse Trees represent the inherent
hierarchical structures in a language sentence.
A parse tree and a derivation are
representations of the same process
Associativity of Operators

- Principles of Programming languages


Associativity of Operators

- Principles of Programming languages


- Principles of Programming languages
Converting Ambiguous Grammar Into Un-
Ambiguous Grammar

- Principles of Programming languages


Converting Ambiguous Grammar Into Un-
Ambiguous Grammar

- Principles of Programming languages


Converting Ambiguous Grammar Into Un-
Ambiguous Grammar

- Principles of Programming languages


Converting Ambiguous Grammar Into Un-
Ambiguous Grammar

- Principles of Programming languages


Converting Ambiguous Grammar Into Un-
Ambiguous Grammar

- Principles of Programming languages


Converting Ambiguous Grammar Into Un-
Ambiguous Grammar

- Principles of Programming languages


Graded Activity: Questions

How to add precedence rules?

What is the problem with


ambiguity?

Is the C++ grammar


ambiguous?

Can you find a sentence with two possible parse trees in this
grammar?

- Principles of Programming languages


Lecture 4
Chapter 3
Describing Syntax & Semantics

Principles of Programming languages


Extended BNF
• Optional parts (0 or one) are placed in
brackets [ ]
<proc_call> → ident [(<expr_list>)]
• Alternative parts of RHSs are placed inside
parentheses and separated via vertical bars
<term> → <term> (+|-) const
• Repetitions (0 or more) are placed inside
braces { }
<ident> → letter {letter|digit}
- Principles of Programming languages
BNF and EBNF

- Principles of Programming languages


BNF and EBNF

- Principles of Programming languages


Question
• Convert the following grammar from BNF to
EBNF

- Principles of Programming languages


Answer

- Principles of Programming languages


Question
Convert the following BNF to EBNF

<assign> → <id> = <expr>


<id> → A | B | C
<expr> → <expr> + <expr>
| <expr> * <expr>
| ( <expr> )
| <id>
- Principles of Programming languages
Answer

- Principles of Programming languages


Semantics
• BNF in the form of CFG is used to describe the syntax
of a programming language
• It can not describe the semantics (meaning) of the
programming language statements, expressions, etc.
• There are two categories of semantics:
– Static semantics: semantic rules that can be checked
during compile time (i.e., prior to runtime). As an example,
variables in a program must be "declared" before they are
referenced.
– Dynamic semantics: semantic rules that apply during the
execution of a program.

- Principles of Programming languages


Static Semantics Examples
There are some characteristics of the structure of programming languages that are
difficult to describe with BNF, and some that are impossible.

As an example of a syntax rule that is difficult to specify with BNF, consider type
compatibility rules. In Java, for example, a floating-point value cannot be assigned to
an integer type variable, although the opposite is legal. Although this restriction can
be specified in BNF, it requires additional non-terminal symbols and rules. If all of the
typing rules of Java were specified in BNF, the grammar would become too large to be
useful, because the size of the grammar determines the size of the syntax analyzer.

As an example of a syntax rule that cannot be specified in BNF, consider the common
rule that all variables must be declared before they are referenced. It has been proven
that this rule cannot be specified in BNF

- Principles of Programming languages


Attribute Grammar
• Attribute grammar is a special form of context-free grammar where some
additional information (attributes) are appended to one or more of its
non-terminals in order to provide context-sensitive information.
• Each attribute has well-defined domain of values, such as integer, float,
character, string, and expressions.

• Attribute grammar is a medium to provide semantics to the context-free


grammar and it can help specify the syntax and semantics of a
programming language.
• Attribute grammar (when viewed as a parse-tree) can pass values or
information among the nodes of a tree.

- Principles of Programming languages


Example:
• E → E + T { E.value = E.value + T.value }

• The right part of the CFG contains the


semantic rules that specify how the grammar
should be interpreted. Here, the values of
non-terminals E and T are added together and
the result is copied to the non-terminal E.

- Principles of Programming languages


Attribute Grammars

• Attribute grammars (AGs) have additions to


CFGs to carry some semantic info on parse
tree nodes

• Primary value of AGs:


– Static semantics specification
– Compiler design (static semantics checking)

- Principles of Programming languages


Attribute Grammars : Definition
• Def: An attribute grammar is a context-free
grammar G = (S, N, T, P) with the following
additions:
– For each grammar symbol x there is a set A(x) of
attribute values
– Each rule has a set of functions that define certain
attributes of the nonterminals in the rule
– Each rule has a (possibly empty) set of predicates
to check for attribute consistency

- Principles of Programming languages


Attribute Grammars (continued)
Attributes
– these are ‘variables’ associated with grammar symbols,
and can have various values attached to them.
Attribute computation functions
– these are functions associated with grammar rules
(productions) and specify how attributed values are
computed.
Predicate functions
– These are associated with grammar rules (productions)
and state some of the syntax and static semantic rules of
the language.
– Predicate functions, which state the static semantic rules
of the language, are associated with grammar rules.
Attribute Grammars: Definition
• Let X0  X1 ... Xn be a rule

• Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized


attributes

• Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i <= j <= n, define
inherited attributes

• Initially, there are intrinsic attributes on the leaves

• The first attributes to be evaluated are the synthetic attributes of the leaf
nodes: called intrinsic attributes: which might be values read-off the
symbol table, e.g. type values etc

- Principles of Programming languages


Attribute Grammars (continued)
Attributes

– actual_type: a synthetic attribute for non-terminals,


storing their actual type: int or real.
• In case of a variable intrinsic-attributes are read-off.
• In case of an expression the value of the attributed is
determined as a function of its children nodes.

– expected_type: an inherited attribute associated with


the non-terminal <expr>
• stores the type value expected of the <expr> given the type
value of the variable on the left hand side.
Attribute Grammars: An Example
• Syntax
<assign> -> <var> = <expr>
<expr> -> <var> + <var> | <var>
<var> A | B | C
• actual_type: synthesized for <var> and
<expr>
• expected_type: inherited for <expr>

- Principles of Programming languages


- Principles of Programming languages
Attribute Grammar (continued)
• Syntax rule: <expr>  <var>[1] + <var>[2]
Semantic rules:
<expr>.actual_type  <var>[1].actual_type
Predicate:
<var>[1].actual_type == <var>[2].actual_type
<expr>.expected_type == <expr>.actual_type

• Syntax rule: <var>  id


Semantic rule:
<var>.actual_type  lookup (<var>.string)

- Principles of Programming languages


Attribute Grammar (continued)
Attributes

– actual_type: a synthetic attribute for non-terminals,


storing their actual type: int or real.
• In case of a variable intrinsic-attributes are read-off.
• In case of an expression the value of the attributed is
determined as a function of its children nodes.

– expected_type: an inherited attribute associated with


the non-terminal <expr>
• stores the type value expected of the <expr> given the type
value of the variable on the left hand side.
Attribute Grammars (continued)
• How are attribute values computed?
– If all attributes were inherited, the tree could be
decorated in top-down order.
– If all attributes were synthesized, the tree could be
decorated in bottom-up order.
– In many cases, both kinds of attributes are used,
and it is some combination of top-down and
bottom-up that must be used.

- Principles of Programming languages


Attribute Grammars (continued)
<expr>.expected_type  inherited from parent

<var>[1].actual_type  lookup (A)


<var>[2].actual_type  lookup (B)
<var>[1].actual_type =? <var>[2].actual_type

<expr>.actual_type  <var>[1].actual_type
<expr>.actual_type =? <expr>.expected_type

- Principles of Programming languages


Attribute Grammars (continued)
Example, parse tree for: A = A + B

Assuming this three has already been constructed, how do we compute the attributes?
Attribute Grammars (continued)
A fully attributed parse tree

- Principles of Programming languages


Attribute Grammar: Example 2
• Grammar of signed binary numbers and
computing the values of the number
Lecture 5
Chapter 3
Describing Syntax & Semantics

Principles of Programming languages


Semantics
• There is no single widely acceptable notation
or formalism for describing semantics
• Several needs for a methodology and
notation for semantics:
– Programmers need to know what statements mean
– Compiler writers must know exactly what language constructs do
– Correctness proofs would be possible
– Compiler generators would be possible
– Designers could detect ambiguities and inconsistencies

- Principles of Programming languages


Operational Semantics

• Operational Semantics
– Describe the meaning of a program by executing
its statements on a machine, either simulated or
actual. The change in the state of the machine
(memory, registers, etc.) defines the meaning of
the statement
• To use operational semantics for a high-level
language, a virtual machine is needed

- Principles of Programming languages


Operational Semantics

- Principles of Programming languages


Operational Semantics

- Principles of Programming languages


Operational Semantics

- Principles of Programming languages


Operational Semantics

- Principles of Programming languages


Operational Semantic Examples

- Principles of Programming languages


Operational Semantic Examples

- Principles of Programming languages


Denotational Semantics
• Based on recursive function theory
• The most abstract semantics description
method
• Originally developed by Scott and Strachey
(1970)
• In denotational semantics, the domain is
called the syntactic domain, because it is
syntactic structures that are mapped. The
range is called the semantic domain.
- Principles of Programming languages
Denotational Semantics
• The method is named denotational because
the mathematical objects denote the meaning
of their corresponding syntactic entities.

- Principles of Programming languages


Denotational Semantics

- Principles of Programming languages


Denotational Semantics

- Principles of Programming languages


Denotational Semantics

- Principles of Programming languages


Denotational Semantics

• This definition requires two auxiliary functions defined in the semantic world,
where Number x Number denotes the Cartesian product.

We need two syntactic domains for the language of numerals. Phrases in this language are mapped into the
mathematical domain of natural numbers. Generally we have one semantic function for each syntactic domain
and one semantic equation for each production in the abstract syntax. To distinguish numerals (syntax) from
numbers (semantics), different typefaces are employed. Note the compositionality of the definition in that the
value of a phrase “N D” is defined in terms of the value of N and the value of D.
- Principles of Programming languages
- Principles of Programming languages
Question
• Write a denotational semantics mapping
function for the following statements:
a. Ada for
b. Java do-while
c. Java Boolean expressions
d. Java for
e. C switch

- Principles of Programming languages


Axiomatic Semantics
• Based on formal logic (predicate calculus)
• Original purpose: formal program verification
• Axioms or inference rules are defined for each
statement type in the language (to allow
transformations of logic expressions into more
formal logic expressions)
• The logic expressions are called assertions

- Principles of Programming languages


Axiomatic Semantics
• An assertion before a statement (a precondition)
states the relationships and constraints among
variables that are true at that point in execution

• An assertion following a statement is a postcondition

• A weakest precondition is the least restrictive


precondition that will guarantee the postcondition

- Principles of Programming languages


Axiomatic Semantics Form

- Principles of Programming languages


Axiomatic Semantics Form

- Principles of Programming languages


Example

where I is the loop invariant (the inductive hypothesis)

- Principles of Programming languages


• In the case of the assignment statement, the
notation is

• example of computing a precondition for an


assignment statement, consider the following:
• x = 2 * y - 3 {x > 25}
• The precondition is computed as follows:
• 2 * y - 3 > 25
• y > 14
• So {y > 14} is the weakest precondition for this
assignment statement and postcondition.
- Principles of Programming languages
• . For example, consider the logical statement

- Principles of Programming languages


For example, it allows the completion of the proof of the last logical
statement .
If we let P be {x > 3}, Q and Q be {x > 0}, and P be {x > 5}, we have

- Principles of Programming languages


Denotation Semantics Vs Operational
Semantics

• In operational semantics, the state changes


are defined by coded algorithms

• In denotational semantics, the state changes


are defined by rigorous mathematical functions

- Principles of Programming languages


Question
• Compute the weakest precondition for each of
the following assignment statements and
postconditions:
• a = 2 * (b - 1) - 1 {a > 0}
• b = (c + 10) / 3 {b > 6}
• a = a + 2 * b - 1 {a > 1}
• x = 2 * y + x - 1 {x > 11}

- Principles of Programming languages


Answers
• a = 2 * (b - 1) - 1 {a > 0}
• 2 * (b - 1) - 1 > 0
• 2*b-2-1>0
• 2*b>3
• b>3/2

- Principles of Programming languages


Answer
• b = (c + 10) / 3 {b > 6}
• (c + 10) / 3 > 6
• c + 10 > 18
• c>8

- Principles of Programming languages


Answer
• a = a + 2 * b - 1 {a > 1}
• a+2*b-1>1
• 2*b>2-a
• b>1-a/2

- Principles of Programming languages


Answer
• x = 2 * y + x - 1 {x > 11}
• 2 * y + x - 1 > 11
• 2 * y + x > 12

- Principles of Programming languages

You might also like