You are on page 1of 38

CSC 305:

PROGRAMMING
PARADIGM
CHAPTER 2:
Introduction to Language,
Syntax and Semantics

Contents

Describing languages
Sentences,

Language, Lexeme and Tokens)

Describing Syntax
Language

Recognizers, Generators,
Grammars

Describing Semantics
Operational,

Axiomatic, Denotation

Group Work 1.0 (Chapter1)


Find ONE programming language for each
of the paradigm (Imperative, Objectoriented, Logic, Functional).
Explain the language overview and design
process for each of the language.
Present your finding during the next class.

Languages

Method of communication
Spoken and written languages can be described
as a system of symbols (sometimes known as lexemes)
and the grammars (rules) by which the symbols
are manipulated.
.. is a set of Sentences
Sentences is a string of characters over some
alphabet

Programming language is ..
..a system of signs used to communicate a
task/algorithm to a computer, causing the
task to be performed. The task to be
performed is called computation , which
follows absolutely precise and
unambiguous rules.

Syntax and Semantics


Syntax is the form of its expressions,
statements and program units.
Semantics is the meaning of those
expressions, statements and program
units.
Example : while statement in Java

while (boolean_expression) <statement>

Programming Language
Example of a program that adds two integers
and prints: 1 + 1 = 2
#include <stdio.h>
int add(int x, int y)
{
return x + y;
}

Syntax for function


add ( )

int main(void)
Syntax for main ()
{
function
int foo = 1, bar = 1;
printf("%d + %d = %d\n", foo, bar, add(foo, bar));
return 0;
}

Lexeme

..is the lowest level syntax unit of language


Include identifiers, literals, operator and special words

Example :
{
return x + y;
}

1
2
3
4
5
6
7

{
return
x
+
y
;
}

Lexeme

Token
..is a category of lexemes
Example :
{
return x + y;
}

1
2
3
4
5
6
7

{
return
x
+
y
;
}

Lexeme

1
2
3
4
5
6
7

open
keyword
identifier
plus op
identifier
separator
close

Tokens

Language Recognizers
Determines whether given programs are in
the language and syntactically correct.
Example : Compiler
Syntax analyzer is part of compiler.
Also known as parser.

Compiler
Program that
converts entire
source program
into machine
language before
executing it

Compiler process
Source
code

Lexical
Analyzer

Tokenized
code

Syntactic
Analyzer
Parsed
code

Object
code

Semantic
Analyzer

Qualified
code

Code
Generator
optimizer

Final
code

Interpreter
Program that
translates and
executes one
program code
statement at
a time
Does not produce
object program

Language Generators
To generate the sentences of a language
Comparing with the structure of the
generator.
Formal methods for describing syntax is:

Grammars

Grammars
Describe the syntax of programming
language.
Backus-Naur Form and Context-Free
Develop by Noam Chomsky and John
Backus
Grammar classes :

Context-free

grammars Whole PL
Regular grammars Tokens of PL

Context-Free grammars

Context-free grammars are powerful enough to


describe the syntax of most programming languages
The syntax of most programming languages is
specified using context-free grammars.
Context-free grammars are simple enough to allow the
construction of efficient parsing algorithms which, for a
given string, determine whether and how it can be
generated from the grammar.
BNF (Backus-Naur Form) is the most common
notation used to express context-free grammars.

Regular grammars
Is a formal grammars.
The two main categories of formal
grammar:

generative

grammars, which are sets of rules


for how strings in a language can be generated
analytic grammars, which are sets of rules for
how a string can be analyzed to determine
whether it is a member of the language.

Classification of grammars

Chomsky (1959) hierarchy consists of


following :
Type

0 grammar (unrestricted)
Type 1 grammar (context-sensitive)
Type 2 grammar (context free grammar)
Type 3 grammar (regular)

Type 0 grammar (unrestricted)


An unrestricted grammar is a formal grammar G = (N,,P,S),
where N is a set of nonterminal symbols is a set of terminal
symbols, where N and are disjoint, P is a set of production rules of
the form where and are strings of symbols in and is not the
empty string, and is a specially designated start symbol. As the
name implies, there are no real restrictions on the types of
production rules that unrestricted grammars can have.

They generate exactly all languages that


can be recognized by a Turing machine.
These languages are also known as the
recursively enumerable languages.

Type
grammar
(context-sensitive)
A formal1grammar
G = (N, ,
P, S) is context-sensitive if
all rules in P are of the form
A
The name context-sensitive is explained by the and
that form the context of A and determine whether A
can be replaced with or not. This is different from a
context-free grammar where the context of a
nonterminal is not taken into consideration.

Generated the context sensitive languages.

Generated

the context free languages.


Type
2 free
grammar
(context
free grammar
Context
languages are
the theoretical
basis )for
the syntax of most PL.
A context-free grammar G can be defined as a 4-tuple:
G = (Vt,Vn,P,S) where
Vt is a finite set of terminals
Vn is a finite set of non-terminals
P is a finite set of production rules
S is an element of Vn, the distinguished starting non-terminal.
elements of P are of the form
Example :
S x | y | z | S + S | S - S | S * S | S/S | (S)
This grammar can, for example, generate the string
"( x + y ) * x - z * y / ( x + x )".

Type 3 grammar (regular)


In computer science a right regular grammar is a formal
grammar (N, , P, S) such that all the production rules in P are
of one of the following forms:
A a - where A is a non-terminal in N and a is a terminal in
A aB - where A and B are in N and a is in
A - where A is in N and denotes the empty string, i.e. the
string of length 0.
In a left regular grammar, all rules obey the forms
A a - where A is a non-terminal in N and a is a terminal in
A Ba - where A and B are in N and a is in
A - where A is in N and is the empty string.

Grammar
<program> begin <stmt_list> end
<stmt_list> <stmt> | <stmt>;<stmt_list>
<stmt> <var> = <expression>
<var> A | B| C
<expression>
<var> + <var>
| <var> - <var>
| <var>
A program consist of the special word begin followed by a list
of statements separated by semicolons followed by the special
word end
An expression is either single or two variables separated by
either + or operator. The only variable name is A, B and C

Grammar Example
A = B * (A + C)

<assign>

=>
=>
=>
=>
=>
=>
=>
=>
=>

<id> = <expr>
A = <expr>
A = <id> * <expr>
A = B * <expr>
A = B * (<expr>)
A = B * ( <id> + <expr>)
A = B * ( A + <expr>)
A = B * ( A + <id>)
A = B * ( A + C )

BNF

Invented by Noam Chomsky and John Backus.


A BNF specification is a set of derivation rules.
Context free grammars
The

whole programming language is context free


grammars.

Fundamentals:
BNF

is a metalanguage

Example of BNF
<postal-address> ::=

<name-part> <street-address> <zip-part>

This translates into English as:


A postal address consists of a name-part,
followed by a street address part, followed by a
zip-code part.

Example of BNF
<street-address> ::=

[<apt>] <house-num> <street-name> <EOL>

This translates into English as:


A street address consists of an optional
apartment specified, followed by a house
number, followed by a street name, followed by
an end-of-line.

EBNF
Drawback of BNF.
Increase the readability and writability of
the production rules.
New notations which are:

Braces

{ } represents sequences of zero or


more instances of elements.
Brackets [ ] optional elements.
Parenthesis ( ) group of elements.

Parse Tree

Naturally describe the syntactic structure of the


language define.
Every internal node labeled as non-terminal
symbol.
Every leaf is labeled with a terminal symbol
Every subtree describes one abstraction
instances.

Parse Tree Example


A = B * (A + C)
<assign>

<id>
A

<expr>

<id>

<expr>

<expr>

<id>

<expr>

<id>
C

Grammar and Recognizers


A recognizers for the language generated
by the grammar can be algorithmically
constructed.
One of the first syntax analyzer generator
is named yacc (yet another compilercompiler) (Johnson, 1975)

Semantics
The meaning of words and other parts of
languages.
Reveal the meaning of the
syntax/grammar
Categorized as follow:

Static

semantics
Dynamic semantics

Static semantics

An attribute grammar (AG) where it is an


extension from context free grammar (CFG).
AG is a mechanism to formalize syntax for both
Context Free Grammar (CFG) and Context
Sensitive Grammar (CSG).
AG used to defined the static semantics of a
language with features.
Compiler can be done at compile time

Attribute Grammar
A = A + B
Syntax rules : <expr> <var>[2] + <var> [3]
Semantics rules
:
<expr>.actual_type
if
(<var>[2].actual_type = int) and
(<var>[3].actual_type = int)
then
int
else
real
end if
Predicate
: <expr>.actual_type == <expr>.expected_type

Dynamic semantics
Done during the run time.
Several ways to specify DS which is:

By

Common method of describing PL

By

a language references manual


a defining translator

Common method of questioning the behavior of PL

By

a formal definition

Common method of questioning the behavior of PL


by using mathematical methods.
Include operational, axiomatic and denotational

Operational semantics

Example
The

while structure in C Programming

while (expression)
Statement;

Might be defined as following operations:


Evaluate the expression, yielding a value.
If the evaluated is True, run statements and repeat
step 1.
If the evaluated is False, terminate the while
statement.

Axiomatic semantics

Example
Logical

statement called an assertion.

Pre-condition
Post-condition

x = y , y = m
Pre condition
X = 5 , y = 7

z = x; x = y; y = z;

y = n , x = m

Program statement

Post-condition
X = 7 , y = 5

Denotational semantics

Example
Define

PL behavior by applying mathematical


functions to program and program component
to present their meaning.
Definition used double bracket [[ ]] to separate
the syntactic definition from the semantic
definition.
Example
syntactic :expression 2*4 , 5+3, 008 -> integer 8
semantic :

[[2*4]] = [[5+3]] = [[008]] = [[8]]

You might also like