You are on page 1of 22

UNIT 4

CHAPTER 9
SYMBOL TABLES
Symbol table
 Symbol table is a table that has entries for all the symbols in the program.
 A compiler has to collect all the names appearing in the source program and enter them in
symbol table.
 Each entry in the symbol table is of the form name, information
 Each time when a name is encountered, Symbol table is searched to see whether the name
is already there.
 If it is a new name, then it is inserted in the symbol table.
 Information about the name is entered in lexical and syntactic analysis.
9.1 Contents of Symbol table
Symbol table is a table with 2 fields (name and information)
Name – name of the symbol (label, identifier, constant, procedure name, array name)
Information – Attributes , Parameters, offset
Features:
We must be able to do the following things
1. determine whether a given name is present in the table
2. add new name in the table
3. Access the information associated with the name, if the name is present
4. Add new information for the given name
5. Delete a name or group of names from the table

There may be separate tables for variable names, labels, procedure names, constant and field
names etc.
Data structures used for the symbol table
 Linear list
- It is very slow
- But, Simple and Easy to implement
 Hash table
- It is fast
- But more complex
 Tree structure
- Intermediate in performance

Information field in the symbol table should include :


1. String denoting the name. If the same name is used in more than one block or procedure,
then an identification of which block or procedure the name belongs to.
2. Attributes of the name. (type , scope ) and information
3. Parameters ( number of dimension, upper limit, lower limit)
4. Offset describing the position in storage.

Name and symbol table records

Linear array of records can be used to implement a symbol table.


Direct method:
 If there is a upper limit on the length of the identifier, then direct scheme can be used.
 FORTRAN uses direct scheme
Record
. .
. .
. .

Identifier DIMPLE
Information Integer, variable
Identifier AMAX
Information Integer, const
. .
. .
. .

Indirect method
 If there is no limit on the length of the identifier, then indirect scheme can be used.
 ALGOL uses indirect scheme
.
.

Length of the identifier(6)


.
Information (attributes)

Length of the identifier(9)


.
Information (attributes)

.
.

. . . R F D I M P L E Q U . . .
Separate string array

Reusing symbol table space :

 A compiler can be designed to run in lesser space, if the space occupied by the identifiers
can be reused.
 If there are no further references to the identifier, then the space can be reused in subsequent
passes.
 Eg. The space occupied by the string table(indirect scheme) can be reused in subsequent
passes.
 In the direct scheme also, if the identifier and information are stored as 2 separate arrays,
then the space occupied by the identifiers can be used in subsequent passes.

Array names :
 If the identifier is array name , then all the subscript information should be entered in the
symbol table.
 FORTRAN limits to 3 dimensions
 In all the case, the lower limit is 1
The information regarding array subscript will look like:

UL1 UL2
UL3 B

A(3)(2)(10)
UL1 3
UL2 2
UL3 10
9.2 Data structures used for symbol tables :
3 data structures can be used to represent symbol table
 Linear list
 Search trees
 Hash tables
Linear lists:

Name 1
Information 1
Name 2
Information 2
.
.
.

Name n
Information n
Available

 Simplest scheme
 Single array or several arrays can be used to store names and associated information.
 New names are added to the list, in the order in which they occur.
 To get information for a name, searching is done from the beginning of the array upto the
position marked by the pointer called AVAILABLE.
 AVAILABLE pointer points to the beginning of the empty portion of the array.
 When the name is found, the information is read from the next location.
 If AVAILABLE is reached without finding names, then fault occurs.(undefined name)
 To insert a new name, scan down the list from the beginning(to ensure it is not already
there)
 If it is available, fault occurs (multiply defined name)
 If it is not available, then the new name is stored immediately following the AVAILABLE
and then pointer is incremented.
Time:
 If a symbol table contain n names, then to insert new name, the time needed is proportional
to n
 To find a data about name, on an average n/2 names have to be searched.
Cost of inquiry is proportional to n
 To insert n names and m inquiries, the total cost is cn(n+m), where c is a constant
Merits:
1. Very easy to implement
2. Minimum space is taken
Demerits:
1. Very slow in retrieving data
2. Useful only for small jobs
3. Performance is poor when n and m becomes large.
Self organizing lists:

 LINK field is added to each record.


 List is searched in the order pointed by the LINK field.
 Every time when a name is referred, it is moved to the front of the list.
 When a set of names are often referred, they will come to the front.
The figure a) shows the e.g of 4 name symbol table
 FIRST gives the position of first record in the list
 LINK field indicates the next record in the list
The order is NAME3 , NAME 1 , NAME 4 , NAME 2
The figure b) shows the effect of moving NAME4 to the front of the list.
 While searching for NAMEi, remember previous name on the list NAMEp
 Temporarily remove entry i from the list
 Link of Previous name(Linkp) should point to the link, where link of i (Linki) points.
 Make Linki point where FIRST points. Then we move NAMEi to the front of the list by
making first point to NAMEi
firstN3N1N4N2
Eg: search for N4.
 Previous name: N1
 Temporarily remove N4
 Link of name1 should point to link2, where link 4 points (ie name2)N1N2
 Link of name4 should point to the place where first points.(ie name3) ie N4N3
 Move name4 to the front or FIRSTN4
 first N4N3N1N2
Merits:
 Frequently accessed records are moved to the front. Hence time is saved
Demerits:
 If the references are Random, it takes more time to locate the record.

Binary search trees


 Add 2 link fields to each record. Left, Right
 These fields are used to link the records into a binary search tree
Search tree property:
 All the names Namej, accessible from Namei following the left link Lefti will precede
Namei in alphabetical order. Namej < Namei
 All the names Namek, accessible from Namei following the Right link Righti will
follow Namei in alphabetical order. Namei < Namek
 During search,

 if Name < Namei, follow left link


 if Namei < Name, follow right link
 if Name = Namei, item found

Algorithm for finding a NAME in binary search tree

While p≠ null do
If NAME = NAME(P) then
…../* Name found, take action on access */
else If NAME < NAME(P) then
P= LEFT(P) /* visit the left child */
else
/* NAME(P) < NAME */
then P= RIGHT(P) /* visit the right child */

/* if we fall through the loop, we have failed to find NAME*/

Time:
 If names are encountered in random order, average length of the path is proportional to log n
 To insert n names and m inquiries, time is proportional to (n+m) log n
Merit:
 Binary search gives better performance than linear search
 Search is narrowed quickly
Demerit
 Implementation is difficult than linear list

Hash Tables:

 Two tables (hash table and storage table) are used.


 Hash table contain k words numbered from 0,1,2,…. K-1.
 These words are pointers to the storage table to the heads of k separate linked lists
 Each record in the symbol table appears exactly in any one of the lists.

To enter a NAME in symbol table :


 Create a record for it, in the available space in storage field
 Link that record to the beginning of that particular list
To determine whether the name is in the symbol table
 Apply a hash function h to NAME.
 h(NAME ) is an integer between 0 to k-1
 follow the link corresponding to that link.

To inquire about a name :


 Calculate h(name)
 Follow that list alone
We have to design a hash function h such that,
 h will distribute names uniformly among the k lists
 h is easy to compute for names consisting of string of characters

Cost :
To insert n names and m inquiries, time is proportional to n (n+m) / k
Advantage:
 This method is superior to linear search and binary search.
 Provides best Performance

Disadvantage:
 more programming effort is needed
 Also it needs more space
UNIT 5
CHAPTER 10
RUNTIME STORAGE ADMINISTRATION
10.1 Implementation of stack allocation scheme
 Local variables can be accessed only by the procedure in which it is declared.
 Global variables can be accessed and are available to all procedures.

- A Runtime stack starts from the high numbered memory location


- Low numbered memory locations contain code for the procedures and space for global data.
- High numbered memory locations(ie bottom of the stack ) have a runtime stack
- A procedure P (main program ), calls procedure Q which in turn calls procedure R
Figure shows the organization of memory

Static area for


programs and global data

TOP
Extra storage for R
Activation record for R
SP
Extra storage for Q
Activation record for Q
Extra storage for P
Activation record for P
Direction of growth

Memory organization for C program

A stack has 2 pointers - 2 pointers are 2 registers


 Stack pointer SP points to the particular position in the activation record of the currently
active procedure
 Top pointer points to the top of the stack

Organization of a C Activation record:


5 items are available in the activation record
 Value of actual parameters
 Count of number of arguments
 Return addresses
 Return value
 Value of SP for the activation record below
In addition to the local data, the activation record contain all 5 items

Local data
SP Old SP
Return value
Return address
Arg count
Actual parameters
Fig Activation record for a C prodcedure
- SP points to the position below the local data
- Hence local data can be accessed by a negative offset from SP and the parameters can be
accessed by +ve offset from SP
- In a c language, all the local variables including arrays are of fixed size.The size of the
activation record can be accessed by the compiler
The local name can be accessed by X[SP] where x stands for offset x
Procedure calls in C
Param T1
Param T2
.
.
Call P,n
Each Param, T instruction is converted to PUSH (P)
Push(p)
Top= top-1
*top=x
These instructions push T onto the runtime stack.

Translation of Call P,n


Store the argument count n, return address, old stack pointer and then jump to the procedure
called.
PUSH(n) store the argument count
PUSH(l1) l1 – label of return address
PUSH() leave space for return value
PUSH(SP) store old stack pointer
Goto l2 l2 is the first statement in the called procedure
 The 1st statement of the called procedure must be a special 3 address statement
procbegin
 This statement sets the stack pointer to the place holding oldsp and sets top to the top of
activation into 3 address stmt record
 SP=TOP
 Top= SP +sp (sp is the size of P, no. of words taken by the local data for proc P)
 Return (expn) in c is translated into return T
 The end of procedure is marked by a 3 address stmt. procend
10.2 Implementation of block structured languages

 In block structured languages, procedures and blocks may define their own data. Hence the
activation records or portion of the activation records must be reserved for blocks.
 Block structured language permits arrays of variable length
 Data referencing environment of procedure / block includes , all procedures and blocks
surrounding it in the program
Displays :
 Display is a method that will have direct access to nonlocal data
 It consists of an array of pointers to the currently accessible activation record

Format of activation record:


 The activation record contain space for local data, return address, return value, argument
count, actual parameter and old value of Sp.
 It also contain pointers for the display
 If adjustable arrays are permitted, their size cannot be calculated, until the procedure is
called.
 Hence, only pointers to the array and their data descriptors are placed in the activation
record
 The array and the descriptors are placed above the activation record between top and SP.
The descriptor can appear in the activation record itself.

EG:
Q calls R(X,Y) and R has the following declaration
integer i;;
real array A[ 0:n-1, 1:m];
real array B[2:10];
The fig shows the activation record for R
Activation record
Data descriptor and
elements of B
Data descriptor and
elements of A
Display pointer for MAIN
Pointer to A
Pointer to B
Local data
Value of i
Old sp
SP 
Return address
Arg count=2
Actual parameter y
Actual parameter x
Procedure calls
Suppose that we are currently executing a procedure Q and in that procedure we call R(x,y)
The level of Q [ number of procedures and blocks in static environment] is 3 - (main, P, Q)
So when Q is in execution, the display has 3 pointers to the top of activation records of main, P and
Q

Procedure R is at a level of 2 (main and itself)


Hence P and Q pointers must be replaced by a pointer to the record R

Procedure Main();
Procedure P(a);
Procedure Q(b);
R(x,y);
endQ;
Q(Z);
endP;
Procedure R(c,d);
endR;
….
…..
End Main;
Main();
Parameter passing

The operand of Param 3 address statement will be treated as a pointer to the value of the actual
parameter or as the value itself
Eg:
Suppose we call the procedure R (A+B*C, D) where A, B,C and D are integers
The translation to 3 address code is:
T1:= B*C
T2:=A+T1
Param T2
Param D
Call R, 2

The translation into object code of the statement:


Param X is PUSH(addr(X)) if call by reference is used,
PUSH(X) if call by value is used.

The translation of an array declaration such as


Real array A[0:n-1,1:m]
Enter information about A into symbol table, and generate code to calculate A’s size and data
descriptor at runtime.
Returns
When a return statement is encountered in Procedure Q (which in turn return to Procedure R), the
procedure Q must do :
 Calculate any return value and put the value in activation record
 Obtain the return address from Qs activation record
 Set TOP to the value it had before P called Q
 Set SP to the value of the old SP found in Q’s activation record
 Jump to the return address to complete the return

Blocks
Blocks are “Parameter less procedures”
Blocks may be given display pointers similar to procedures.
Procedure P(X,Y);
Integer I;
Real array A[40];
Begin
Integer I, J;
Integer array B[50];
.
.
Begin
Integer K; B1
.
. B2
Go to L;
End;
End;
.
.
L:
Begin
Real array C[60]; B3
.
.
End;
.
.
End P
The fig shows the activation record when B2 and B3 are active
B2 is active B3 is active

UNIT 5
CHAPTER 11
ERROR DETECTION AND RECOVERY
11.1 Errors

Properties of good error diagnostic system


 Error messages should pinpoint the errors
 Messages should be understandable by the user
Missing right parenthesis instead of OH 17
 Messages should be specific and they should localize the problem
ZAP not declared in procedure B1 rather than missing declaration
 Messages should not be redundant
If a variables A is not declared, it should be said once and not every time A appears in
the program
Sources of errors:
Errors can occur at every stage of the process
1. The design specification of the program may be wrong (design time errors)
2. The algorithm may be incorrect (algorithmic errors)
3. Programmer may introduce errors in implementing the algorithm (coding errors)
4. When the program is keyed on to cards, errors may occur (keypunching or transcription
errors)
5. Program may exceed machine limit or compiler limit (more number of dimensions or array
size larger) - program error
6. Compiler can insert errors while compiling (compiler errors)
Different types of transcription errors
1. Insertion of extra character or tokens (Insertion error)
2. Deletion of character or token (deletion error)
3. Replacement of a character by incorrect character (replacement error)
4. Transposition of 2 adjacent characters (transposition errors)

From the view of compiler, errors can be classified as 2 categories


 Syntactic errors
 Semantic errors
Syntactic errors: Errors in the Syntax
Errors detected by the lexical phase or syntactic phase of the compiler
Semantic errors: Errors in the semantics
Errors detected by the compiler both at compile time and run time

Syntactic errors:
 Missing right parenthesis (deletion errors) MIN(A,2*(3+B)
 Extra comma (insertion errors) DO 10, I=1,100
 Colon in place of ; (replacement errors) I=1: J=2;
 Misspelled keyword (transposition errors) PORCEDURE SUM
 Extra blank (insertion errors) /* COMMENT * /

In some cases it is difficult to say, where the errors have occurred.


We could not find the place of errors.
A=B-C*D+E)
Errors may not be noted for a long time
IfA = B then ….. (the detection of the error occurs after a long distance)
No place between IF and A

Minimum distance correction of syntactic errors


Way of defining errors and their location is called minimum hamming distance method
We define a collection of error transformations
We say that a program P has k errors, if the shortest sequence of error transformations that will map
any valid program into P has length k.

Eg: IFA = B THEN


SUM=SUM + A
ELSE
SUM=SUM – A;
In this example, insert one blank to get correct code. Hence the distance is 1
This method is very expensive. Hence rarely used
Insertion of one character or deletion of one character makes the code error free.
Minimum distance mean the least number of insertions, deletions, and symbol modifications
necessary to transform one string to another
Semantic errors:
Semantic Errors can be detected both at compile time and runtime
Compile time errors:
Errors of declaration ad scope can be detected at compile time
 Undeclared identifiers
 Multiply defined identifiers
 Type incompatibilities between operator and operands
 Type incompatibilities between formal and actual parameters
Type checking differs from one language to other:
 Some languages have only one data type. So no type misagreements [bliss]
 Some languages have many types. But the type of every name and expression should be
known at compile time. Strongly typed language [pascal]
 Some language do automatic type conversion between operators and operands of different
types.[PL/1]

Dynamic errors:
 In some languages, Some errors can be detected only at runtime. Some languages (APL,
SNOBOL) have several data types and type of a name can change at runtime. Hence type
checking cannot be done at compile time. They should be postponed till runtime
 Range checking can be done only at runtime. Range for checking array subscripts and case
stmt selection can be done only at runtime. [ subscript out of range, Arbitrary value in case
stmt]

Errors seen by each phase


Fig shows the plan of error detector/corrector portion of the compiler.
It contains routines to recover from lexical error and syntactic error
Also it contain routine to detect semantic errors and routines to print the diagnostics.
The diagnostic routine communicate with the symbol table, to avoid printing redundant messages.

Errors can be classified based on the phase in which the compiler detects them
Lexical Phase errors
Syntactic phase errors
Semantic phase errors
11.2 Lexical phase errors
[ function of lexical analyzer : break the input not stream of tokens]
If after some processing, the lexical analyzer discover that no prefix of the remaining input, fits
into any token classs, it calls error recovery routine. If the input does not match into any token
class, it calls error recovery routine
The problem of recovering the lexical phase errors may sometimes cause problems
Eg . IF ( long condition .OR. X .LT. Y ) goto 30
Assume that after X, the line split’s to next line with continuation mark.
Suppose if the continuation mark is wrongly entered in 8th column, , after reading X it finds the end
of statement
Statement does not match with any syntax. Here syntax error occurs, Parser must do recovery
Reduce the statement with IF to a nonterminal statement
Then the lexical analyser must process the continuation
1.LT. Y ) goto 30
1. real constant ( variable)
LT.  variable
Insert = sign in between (id=id)
.Y (delete . and try to find next token.)
Y (idn) --- it tries to add + id +id

Then ) cause syntax error.


At some point , the syntax error corrector will construct a completely erroneous statement and
proceed.
Deleting many characters from the source program can cause severe difficulties for the syntax
analyzer and next phases of the compiler
Minimum distance matching:
Minimum distance matching is used to correct the spelling of the token
Suppose we know that the next token is keyword
But the input contains X, which is not the keyword, then we can determine which keyword is
closest to X
If we suspect that an identifier X is misspelled because no declaration for x is in symbol table
We can search for an identifier in the symbol table that is closest to X
For finding the correct keyword or correct identifier from symbol table, minimum error distance
matching algorithm is used.
Easier to implement
Runs faster
Very expensive

11.3 Syntactic phase errors


Error detection and recovery is centered around syntax analysis.
In syntactic phase, the parser detects an error, when it has no legal move from its current position
To recover from error, the parser should locate the position of the error, correct the error, and
revise it’s configuration and then continue with parsing

Time of detection:
LR parsers have the valid prefix property
ie. they will announce the error as soon as they find a prefix of the input with invalid continuation
LR parsers announce the errors at the earliest
Advantage of a parser with valid prefix property
 Reports the error at the earliest
 Limits the amount of erroneous output passed to subsequent passes.

Panic mode :
In the panic mode of the recovery, the parser discard the input symbols until a synchronizing token
is encountered. (delimeter such as ; or end)
The parser then deletes stack entries until it finds an entry such that it can continue parsing
Advantage:
Simple to implement
It never get into infinite loop
Error recovery in operator precedence parsing;
There are 2 cases in which the operator precedence Parsing discovers syntactic errors
 If no precedence relation holds between the terminal on the top of the stack and current
input symbol
 If a handle is found, but there is no production with this handle as the right side
These 2 errors will be noted and recovered
Handling errors during reduction
1. Normally the checker checks and displays the following errors
If +or * is reduced, it checks whether non terminals appear on both sides. If not, error
message will be displayed.
MISSING EXPRESSION
2. If id is reduced, it checks whether there is no nonterminal to the right or left.
If so, 2 expressions not connected by operator message will be displayed
3. If ( ) is reduced, it checks whether there is a nonterminal between the parenthesis
If not, null expression between parenthesis will be displayed
4. Nonterminal should not appear on the sides of ( and ) Parenthesis
If so, 2 expressions not connected by operator message will be displayed

Handling shift reduce errors

In the precedence matrix, the blank entries will be filled with the names of error handling routines.
e1: ($ - e1 is called when the expression ends with left parenthesis
action : pop ( from stack
msg: Illegal left parenthesis

e2: id or ) is followed by id or (
[ id id, id(, )id, )( ]
action : insert + on the input
msg: missing operator

e3: $) [when the expression begins with right ) ]


[ id id, id(,, )id, )( ]
action : delete ) from the input
msg: illegal ) parenthesis

e4: $$ (when the expression is null)


[ id id, id(,, )id, )( ]
action : insert id on to the input
msg: missing expression

Error recovery in LR parsing


LR parser will detect an error when it consults the parsing action table and finds an error entry
Errors never result from GOTO table
Examine each error entry in the parsing table and decide on the possibility of getting those errors
An appropriate recovery procedure will be constructed
Automatic error recovery in YACC
In YACC, the user identifies the major nonterminals for programs , blocks or statements
The user then adds error productions of the form A  error α to the grammar
When the parser encounters an error, it finds the top most state on the stack. Which is of the form
Aerror α
Then it shifts the error, as if a token is found
The parser then discards the input symbols until it finds a symbol with which the parsing can
proceed
The semantic routine will not manipulate it and it calls the diagnostic message

Adhoc error recovery for LR parsers


In the LR parsing table, fill each blank entry in the ACTION field with a pointer to the error
routine.
e1 is called , when an operator + or * or $ sign was found instead of id or ( in the beginning of an
expression
missing operand
e2 is called , when there is extraneous ) parenthesis in the input
unbalanced right paranthesis
e3 is called, when instead of operator, id or ) is found
missing operator
e4 is called when the end of i/p is found instead of operator or right )
missing right paranthesis

Error recovery in LL parsing


The error correction strategies for LL parser is similar to LR parser
There is a panic mode of correction for LL parser
On finding the error, we skip on the input to the next synchronizing token such as ;
Pop the stack until a symbol X appears on the top of stack
If X is not the synchronizing token, then X have to be expanded.
Symbols in the stack should be popped in such a way that synchronizing token appears on the stack
top

11.4 Semantic errors


Primary sources of semantic errors:

 Undefined names
 Type incompatibility
We have to recover from such errors and we should suppress the duplicate error messages
When the first time, we encounter an undeclared name, we make an entry in the symbol table for
that name with necessary attributes
In addition to this, a flag is set to indicate that the entry was made due to semantic error and is not a
declaration
When the name appear next time, a check is made to determine whether a previous instance of
error msg is noted in the symbol table. If so, no new error message is printed
If not, error message is printed, and new erroneous usage is now added to the symbol table
Hence duplicate error messages are avoided.

You might also like