Professional Documents
Culture Documents
A. PROGRAMMING LANGUAGES
Machine lang.: It is the native language of a computer. It is the notation to which the
computer responds directly. It consists of series of 0’s and 1’s to represent an instruction.
So the programs in machine lang. are unintelligible.
Assembly Lang.: It is written in a low-level programming lang. consist of symbolic
instruction. Each symbolic instruction is called mnemonics, which is corresponding to a
machine instruction in binary.
High Level Lang.: It provides readable familiar notations in natural languages. The
advantages of High level lang. include machine independency, availability of program
memories, data types, structures and operations on primitive data.
Language Implementation on a machine: To bridge the gap between the high level
language and machine, we can use two different methods: Compilation and Interpretation.
2
Compiler: is translator that changes the program into a form suitable for execution is called
compiler and this process is called compilation. In compilation the source program is translated
first into an intermediate representation. The time taken for this is called translation time. Then
interpreter translate this into executable binary code. The time require for this is called run time.
Figure 1: Compilation
Interpreter: The interpreting is also a process of translation, but it is directly convert the high
level source program to machine executable form. i.e it retrieve the statement, determine the
action and perform the action. So, there is no distinction of translation and run times.
Figure 2: Interpretation
The compiler is more efficient than interpreter. The compiler first translate into a
intermediate form and run that code for execution. So, the run time translation is only
repeated. But, in case of interpreter, the line by line translation is processod and so it has
to repeat every time.
Interpreter can be more flexible: Since interpreter provide repeated examination of
source text line by line, it is easy to identify the bugs and can modify the program with
less effort. For compilation this is not possible.
C. LANGUAGE DESCRIPTION
Language Descriptions
Precise specification of a programming language is essential for describing the computational
behavior of the language. In general, formal specifications of a programming lang. provide the
following:
Syntax Description:
Syntax refers to the formation of constructs in the lang. and defines the relation between them, it
describe the structure of the lang. with out addressing the meaning of the constructors. Eg: int i;
this statement is the syntax for declaring a variable in C. and int 2i is invalid.
Abstract Syntax of a lang. identifies the meaningful components each constructor of the lang.
This syntax have two part- Lexical Syntax and Phrase-Syntax. The lexical syntax explains
how tokens are arranged into programs and the phrase syntax explain how tokens are arranged in
the program.
Keywords are the alphabetic character sequences that are meaning ful in a language. E.g:
if..else, do..while tec
The actual character sequence used to write down the occurance of a token is called spelling.
The tree representation of an abstract syntax is called Abstract Syntax Tree. It shows the
operator –operand relation of an expression. In this an operator is represented by root and
intermediate nodes. The operands should come at the leaf. This is also called parse tree.
Another method of representing the syntax is Formal grammars. Each programming lang. has a
vocabulary of symbols and rules for how these symbols to be put together to form phrases. This
is called grammars. This grammar describes the structure of a lang. with out considering the
semantics. A concrete syntax of a grammar is the written representation including the lexical
details such as placements of keywords and punctuation marks.
A Context Free Grammar (CFG) gives the notation for specifying concrete syntax. The
methods for writing this grammar include BNF. A grammar defines a set of all possible phrases
that constitute programs in the subject lang. together with their syntactic structures.
5
A set of symbols known as terminal symbols that are non-divisible symbols in the lang.
By conversion, the terminals are represented by lower case letters.
A set of non terminal symbols known as variable symbols. That are the construct in
language. They represent the intermediate definitions with in the lang.
A set of rules known as production rules that are used to define the formation of the
constructs.
A distinguished non-terminal known as the Start symbol
The grammar of the lang. posses an hierarchical structure between the symbols. This tree
representation of grammer is called parse tree. In parse tree, the root is start symbol. The
non-terminals are represented by intermediate nodes and terminals are reoresented by
leaves. The relation is based on th productions of the grammar.
Consider a grammar rule “ A real number have integer part,a point and a fraction. This
real part can have one or more integers or digits. The fraction part may also contain one
or more digits.” Using this grammar, 12.11 is reperesented as follows.
Hear the left to right scan of tokens will retrieve the original expression or phrase. This is
called parsing.
The concept of the CFG can be independent of the notation used to write grammar.
Backus-Naur-Form (BNF) is one such form to represent the grammar. The BNF
notations include:
6
Production has two parts, left and right (head and body). The head and body are seperated by
“::=” symbol, read as can be.
Or |
BNF example:
<RN>::= <INT>.<FRACTION>
<INT>::= <INT><DIGIT.|<DIGIT>
<FRACTION>::=<DIGIT><FRACTION>|<DIGIT>
<DIGIT>::=1|2|3|4|….|9
Ambiguity: A grammar that represents a phrase associated with its language in two or more
distinct trees known as syntactically ambiguous grammar. If ambiguities exist, they are resolved
by establishing conversions that rule out all ambiguities exists.
E.g Consider the grammar, E::=E-E|0|1 The representation of 1-0-1 using this grammar has two
parse tree as follows:
Dangling Else Ambiguity is a well known example of syntactic ambiguity, which arises if the
grammar has the following production.
Consider this conditional: if E1 then if E2 then S1 else S2 the statement becomes ambiguous
because, it is not clear to which if else belongs. The parse trees in this case are:
7
The dangling else ambiguity is typically resolved by matching an else with the nearest inner
matched if. Thus the second option will be selected by the compiler.
Variants of Grammar
It describes some variation in BNF to represent the grammars. This includes Extended BNF and
Syntax Diagram.
EBNF contain some extra notations that are not in BNF. The extra notations in EBNF are
Syntax diagrams are pictorial representations of a grammar. In this the syntactic description and
equivalent BNF are represented graphically. The terminals are represented by circles and non
terminals are represented by square boxes. Eg: The syntax diagram of a grammar for real
number.
8
Attributed Grammar
Natural Semantics
Denotational Semantics
An attributed grammar is a formal way to define attributes for the production of the formal
grammar associating attribute value. The attribute can be two types: Synthesized and inherited.
The Synthesized attributes get the values from the attributes attached to the children of its non
terminal. i.e these are the results of the attribute evaluation by rule. For example consider the
grammar for expression evaluator:
In this tree representation the nodes get the value or attribute from
the children nodes. That is why it is called synthesized attributes. The use of synthesized
attributes for the representation of the semantics is also called Syntax Directed approach of
defining semantics. The representations in which the production and semantic rules are written
together are called Syntax directed definition.
Inherited Attributes are passed down from parent nodes or sibling nodes. i.e. Inherited
attributes are attributes that are passed to a rule, as opposed to synthesized attributes, which are
returned from a rule. When a non-terminal A appears in the right side of the production, then it
is replaced by the inherited attribute of A (say A.in).
Natural semantics associates logical rule with the syntax of a language. The logical rules can be
used to deduce the meaning of a construct. This is used with PROLOG environment.
It is read as follows: “If E1 has value v1 and E2 has value v2, then plus E1 E2 has value, v1 +
v2.” In the expression plus a b the result depends on the value of a & b. So, to handle these value
we introduce an environment env.
in the programming language. The denotations are enclosed in a double bracket [[ ]]. The double
bracket contain the syntactic phrase of a programming lang. It have two parts- Domain (identify
the types and syntax relevant to the lang.) and Semantic rules (synthesize the meaning of a
construct in terms of the component). So 2*4, 5+3, 008 in denotational semantics represent the
same abstract syntax 8. I.e. meaning [[2*4]] =meaning [[5+3]] .
Expression Notation
The programming languages use a combination of notations called Infix, postfix, prefix and mix
fix to represent expressions.
(Try the Definitions of Infix (E1 op E2) and Post fix (E1 E2 op).
The associativity is used to deal with the multiple occurrences of operator of same precedence
level. It can be left associative (the multiple occurrences of the operator are grouped from left to
right. +,-, *, /) or right associative (the multiple occurrences of the operator are grouped from
right to left eg: ex2).
D. IMPERATIVE PROGRAMMING
The program flow is based on the actions, placement of statements. The reader can understand
the program from its structure itself.
11
Design Principles:
The important considerations for the design of imperative paradigm are:
Structured Programming: The structure of the program should be understandable. i.e.
from the structure and function & variable names we can identify the meaning of the
program.
Efficiency: A language must allow an underlying machine assignment oriented machine
to be used directly and efficiently.
Syntax Directed Control Flow:
Structured control Flow: A program is structured if the flow of control through the
program is evident from the syntactic structure of the program text. i.e the program
should help us to understand what the program does. In imperative, it is single entry/
single exist control flow.
Composition of Actions: Sequence of statements can be grouped into a compound
statement by inclosing it between keywords begin and end. In C, we use { } for the same
purpose. This is useful for block oriented control flow.
Selection Statements: It select one of two alternaive substatements for execution. E.g.:
If<exp> then S1 else S2. If conditions are nested then avoid nesting at then part..ie use
nested if as:
if ….then…
else if …then….
else if…..then….
else…….
Looping Constructs: The repeated execution of a block of codes can be done with help
of looping constructs. This can be two types:
Indefinite Iteration : The no. of execution cannot be pre-determined. Eg:
While <exp>do <statement> repeat the statement (body) as long as expression is true.
It check the expression before entering to the loop.
Repeat S until (exp) will repeat the statement until the condition is true. It checks the
exp after the first execution.
For i: 1 to 10 do A[i]=10;
Case Construct : A case statement uses the value of an expression to select one of the
several alternatives (sub expression) for execution.
Case exp of
<constant 1>: <statement1>
12
Translation for E
If E fails goto
Translation of S
Goto
But, case statements are difficult to implement if the case constants are not consecutive. So,
we have three methods for implementing case statements:
If the cases are less than 7, implement using conditionals i.e nested if…else
If constants are large then a special data structure called Jump Table is used. The
ith entry to the jump table is the codes for constant i.
If cases are large and many of jump table entries are vacant, then compiler
implements the cases using Hash Table.
13
Two important syntactic concern are placement of semicolon and Dangling else ambiguity.
The delimiter like ; are used to terminate a sequence or separate. In pascal notation if it
appears between elements, it is consider as a separator and if it is at the end of the
statement it is considered as the terminator. To avoid this ambiguity modula2 uses a key
word end to terminate the statement.
The dangling else ambiguity arises when if nested at then part. This will result in
problem like which if does else belongs. Most lang. resolve it by matching the else with
nearest if. Modula2 avoid this by ending the statement with end keyword. So, if exp then
if exp then S1 else S2 is written in any one of the form:
Role of Types:
A data object means something meaningful to an application. The data representation means the
organization of values in a program. Thus object have corresponding representation in program.
This data representation are built up from values that can be manipulated directly by underlying
machine. How these data representations are built up is described by types. So, role of types can
be:
Basic Types:
atomic or primitive types.
Eg: type day:=(mon,tue, wed,thu,fri,sat,sun); The mon,…sun are called elements of type day.
The operations in enumeration are
Integers and Reals: The values with integer and real are determined by underlying machine.
Commonly they ranges from –maxint to maxint. The operations with iintegers and real are
Short circuit Evaluation: In this the second operand evaluated iff necessary. This is a technique
for compiler optimization.
Eg: if (i>10 && a==b) in this if i<10, the whole expression is false, so a==b is not evaluated.
Subrange: They are special case of basic type. They restrict the range of values of an existing
type.eg: 0..99. or type year:=1900 .. 1999.
Layout of basic Types: The basic types are laid out by using machine representation values. On
most machines char fit in a byte, integers in a word, and real numbers in two contiguous words.
Compound Types:
They are built up from basic types. Eg. Array, Record, union, set and Pointers
Arrays:
An array specify the index of the first and last elements in the array and the type of all elements
in the array. The array is collection of homogenous type elements. An array is declared by
following syntax: array[<simple>]of <type>. Here simple represents the range of array and type
is the type of elements in the array.
Array Layout: The array is stored as memery cells inconsecutive location, cell width depends
on the type. var A: array[low..high] of T will create an array of cells from high to low. The
high and low are indices of array A.
In this i*w is calculated at runtime The portion bae-low*w can be pre-computed as constant c at
compile time. Then the address of A[i] will be i*w+c;
In algol60, the array can be declaring using conditionals. Var A: array[if c<0 then2 else 3]..20] if
integers. Here, if c is known at compile time then the layout can be done at compile time.
In C, the layout is known at compile time and storage allocation is down at run time. If keyword
static is used, then both storage and layout are done at compile time. So the calculation of array
bound can be;
Static evaluation
Evaluation upon procedure entry
Dynamic Evaluation
Records:
Records allow variables to an object to be grouped together and treated as a unit. The different
elements are called field of record. Declaration Syntax is:
record
<name1>:<type1>;
<name2>:<type2>;
…..
<name k>:<type n>;
end;
Eg: type complex=record
re, im:=real;
end;
16
A record is used to represent the objects with same property. A variant record represent the
object with some but not all properties in common. i.e it have fixed part and variant part. Union
is special case of variant record, in which no common field exist.
Example: Consider a binary tree. In tree the node can be with no child, one child or left and right
child. So,
Type kind= (leaf, unary, binary);
Node = record
C1 T1;
C2 T2;
Case k: kind of
leaf: ();
unary: (chil2:T3);
binary: (lchild, rchild: T4); end;
17
Here C1 and C2 are common fields for T1 and T2. The third part depends on the tag kind.
The space reserved for a variant part is just enough to hold the fields in the largest variant.
Set:
Set is a collection of elements. It can efficiently implemented using bits. A set can be represented
using [ ] brackets. It can be empty [ ], subrange [1..3] or list [+.-.*./]
The set can be implemented using bits. In this the presence of an element is represented by 1 and
absent by 0. So, [1,3] is represented as 101. So, a set of n elements can be represented by a n bit
vector.
Pointers:
A pointer type is value that provides indirect access to the element of a known type. It is used
for:
Efficiency: Instead of moving or copying a large data structure move or copy pointers
Dynamic Data: Data structures that grow and shrink during the execution
Operations On Pointers:
18
Dereferencing: ↑ have double duty as prefix and post fix. type ↑T implies the pointer
points to the type T. p↑ imply the object pointed to by p; The later is called dereferencing.
Dynamic Allocation on heap: The new( ) is used to allocate a new pointer. The new( )
leave a pointer p pointing to a newly allocated data structure of type T on heap.
Assignment: The pointer assignment can be performed between same type of pointers.
Equity Testing: If two pointers points to same data structure then they are equal. If not,
they are pointing to different locations.
Deallocaion: The deallocaion of pointers is done by dispose() command.
Linked Lists: The data structure that Grow and shrink can be constructed using records and
pointers. By static layout principle, the size and layout of the storage used for each types are
known statically before the program runs. So, we can define the fixed size cells as:
\cell = record
Info : integer;
Next= link;
End;
Dangling Pointers: It is a pointer to storage that is being used for another purpose, typically the
storage has been deallocated. The storage that is allocated but inaccessible is called garbage. A
program containing garbage are called memory leaks. The memory leaks leads to dangling
pointer.
Example: p:=q, leave p & q points to the same cell, pointed by q. The cell pointed by p is now
inaccessible and so memory leak.
Types and Error Checking: The types can be used for the binding of values to an
expression and error checking of expressions. So, here the types extend from values to
expression.
19
Variable Binding: In most of the imperative languages, a fixed type is assigned (bound)
with a variable. If a variable is declared as integer, it must denote the integer value. This
value can be changed during the execution. So, a variable binding associate a property to
a variable. A binding is static, if it is done at compile time (early binding) and if it is at
runtime it is called dynamic binding (late binding).
Types Systems: The widely followed principle is that every expression should have a
type that is known at compile time. The type system for a language is a set of rules for
associating a type with the expression. A type system rejects an expression if it does not
associate a type with the expression. Example for a type system: If expression E and F
have same type, then expressions E+F, E-F, E*F and E/F have that same type.
Basic Rule for Type Checking:
When a function from a set A to a set B is applied to an element of set A, the result is an
element of set B.
Arithmetic Operators are functions. Associated with each operator op is rule that
specifies the type of an expression E op F of the type E and F. Eg: If E and F are integers
then E + F have the type integer.
Overloading: The operators + and * can be different meaning depending on the context.
This is called overloading.
Coercion: The implicit type conversion by the compiler is called coercion. In 2*3.14 the
2 is coerced in to real.
Polymorphism: A polymorphic function has a parameterized type or generic type. For
example data structure like stack, queue can be defined on any type. So stack of integers
and stack of characters are possible. A polymorphic type allows a data structure to be
defined once and then applied later to any desired type.
Type Equivalence:
var x, y: array[0..9] of integers
var z: array[0..9] of integers
In Pascal context x, y are equal but not z. In C context x, y, Z are equal.
This leads to the definition of structural equivalence.
So, by these rules, char and char are structurally equivalent and so S and E:
Type S= array [0..99] of char;
20
Name Equivalence: The type name equivalence can be of following type, by applying
SE1-SE3:
Pure Name Equivalence: A type name is equivalent to itself, but no constructed
type is equivalent to other constructed type.
Transitive Name Equivalence: A type name can be equivalent to itself and be
declared equivalent to other types names. In the following example, T, S, U are
equivalent to each other and equivalent to integer, because integer is a type name
also.
type S=integer;
type T=S;
type U=integer;
Type Expression Equivalence: A type name is equivalent only to itself. Two type
expressions are equivalent only if they are formed by applying same constructor to
equivalent expressions. i.e the expression have to be identical.
Circular Types: Linked data structure give rise to recursive or circular type.
Static and Dynamic Type Checking: Type checking ensures that the operations in the program
are applied properly. It is the mechanism to prevent errors. A type error occurs if a function f
expects an argument of type S, but f is applied some a, that does not have type S. A program that
run without type error is called type safe.
In Static checking, the programs are examined for type errors during translation. Using the rules
of a type system, a compiler can infer from the source text that a function f will applied to an
operand a of the right type, each time the expression f(a0 is called.
The dynamic checking is done at running time, by inserting extra code into the program to
detect the impending errors. Extra code for dynamic checking use space and time, so it is
expensive and seldom used. i.e the static checking sufficient for the program and dynamic values
of types are rarely checked.
21
The effectiveness of the type system can be referred by Strong and Weak type system. A type
system is Strong if it accepts only safe expressions. i.e. the expressions accepted by strong
systems are guaranteed to be executed with out errors. A type system is Weak if it is not strong.
A weak system allows some unsafe program to slip through.
Introduction to Procedures
Procedures are a construct for giving a name to apiece of coding, called body. When the name is
called the body is executed. Each execution of the body is called activation of the procedure.
Variable names in an imperative lang. are a sequence of mapping from a source text to its value
at run time. This mapping includes scope, activation and state.
Example:
Recursive procedure is a special case, where the procedure can activate from wiyhin its own
procedure body, either directly call by itself or indirectly call by other procedure. A recursive
procedure have multiple activation in progress at same time. Example:
function f(n: integer):integer begin if n=0 then f:=1 else f:=n*f(n-1); end;
22
Benefits of Procedures:
Parameter passing refers to the matching of actual with formals when a procedure call occurs.
Differing interpretation of what a parameter leads to different parameter passing methods. Theay
are: Call by value, call by reference and call by value reference.
Call by value: Under call by value the formal parameter corresponds to the value of an actual
parameter. Let x be a formal of procedure P(x) and we call P (E), then x takes value of E. In this
case the formal x has no effect on actual E.
Call by value Result: This is also called copy-in/copy-out because the actual are initially copied
into the formals and formals are eventually copied back out to the actual. Actuals with locations
are treated as follows:
Copy in Phase: Both the values and locations of the actual parameters are computed. The
values are assigned to the corresponding formals, as in call by value, and the locations are
saved for the copy out phase.
Copy out phase: After the procedure body is executed, the final values of the formals are
copied back to the location computed in the copy in phase.
Scope rule for Names
Nested Scope
Activation Records
Associated with each activation of a procedure is storage for the variables declared in the
procedure. The storage associated with the procedure is called activation record. In a sequential
language, control can be in most one procedure at a time. When P calls Q, execution of P hold
when control flows to Q and execution of Q resumes when control return from Q. So, control
flows between procedure are a last in first out (LIFO) manner.
The flow of control between activations can be depicted by tree called activation tree. The nodes
in the tree represent activations. When activation P calls activation Q, the node for P has the
node for Q as a child. If P calls Q before it calls R, then the node for Q appears to the left of the
node for R:
Control Link
Access Link
Saved state
Parameters
Function result
Local variable
24
Data needed for the activation of a procedure is collected in a record called activation record or
frame. The record contains storage for local variables, formal parameters and any additional
information needed for the procedure activation. E.g.:
var i: integer;
C: char;
begin…end;
In this code, activation record includes, storage for the actual parameter u, the Boolean result, the
variables I and c, any temporary storage needed for evaluating expressions within the function
body and storage needed to manage activations.
In a language with recursive procedures, each activation has its own activation record.
Control link also called dynamic link points to the activation record of the run time caller.
Access link also called static link is used to implement lexically scoped languages.
The stack discipline allows storage to be reused efficiently. Storage for local variables is
allocated when activation begins and it is released when activation ends. The records are
allocated in LIFO manner.
Drawback: A stack can be use dif storage for locals is not needed once an activation ends. The
storage can then be deallocated.
A general technique for managing activation record is to allocate storage for them in an area
called heap. The records stay on the heap as long as they needed. A technique called garbage
collection is used to automatically reclaim storage that no longer needed.
The life time of an activation record begins when the activation record is allocated and ends
when the location in the activation record can no longer be accessed from the program. That is,
the allocation and life time is not tied with LIFO manner.
The static variables with in a procedure retain their values between activations: they are shared
by all activations of the procedures. Local variables are bound to distinct storage in each
activation, except if they are declared to be static. The life time of a static variable is the entire
computation. It retains its value from activation to activation.