You are on page 1of 4

Unit-V: Symbol Tables

The Contents of a Symbol Table: Compiler uses a symbol table to store scope and attributes
information about names(each name has value and attributes. On the logical or abstract level, a value is
the contents of the placeholder with that name. This value may be numerical or it could be an input-
output relationship if the name is a procedure, for ex.). This information includes:
1. The string of characters denoting the name. {If the same identifier can be used in more than one
block or procedure, then scope (i.e. which block or procedure the name belongs to) of name
should also be specified.}
2. Attributes of a name such as data type, scope, etc., and information regarding usage of name(e.g.
label, formal parameter, array, etc.).
3. Parameters such as no. of dimensions of arrays and the upper and lower limits of each
dimension.
4. Offset indicating the position of the name in the storage.

Various operations, which can be done on symbol table, are:


1. determine if a given name is in the table .
2. add a new name to the table.
3. access the information associated with the given name
4. add new information for a given name
5. delete a name or group of names from the table

Keywords are entered into the symbol table initially. Alternatively, if the lexical analyzer recognizes the
keywords, then they may not be entered into the symbol table.
If the size of the symbol table is fixed when the compiler is written, then the size must be large enough
to handle any source program presented. But such a fixed size symbol table may be insufficient for large
programs but too big for small programs. Therefore, it is useful if the symbol table can grow at compile
time.
Symbol table entry for a declaration of a name is stored as a record. Since information stored about a
name depends upon its usage, the format of entries may not be uniform. To keep symbol table records
uniform, we may put some information about a name outside the table entry, with only a pointer to this
information stored in the record.

Characters in a name can be stored as follows: If there is max. limit on the no. of characters in a name,
symbol table entry can be stored as in fig. a. If there is no limit on the no. of characters in a name,
symbol table entry can be stored as in fig. b.

Name Attributes

S o r t
a

Fig a Name Attributes

S o r t EOS a EOS

Entry for an array a in a language, that does not limit no. of array dimensions is as follows:
Name Attributes

LL1 LL2 LLn

UL1 UL2 Uln

Storage allocation information: Information about storage locations for names at run time is stored in
symbol table.
In case of static storage, the position of each data object relative to a fixed origin, such as the beginning
of an activation record is decided at compile time.
In case of dynamic storage, like names whose storage is allocated on a stack or heap, the compiler
allocates no storage. Compiler only plans activation record for each procedure.

Data structures for symbol tables: Following data structures can be used for symbol tables: 1. Linear list
2. Hash table
1. Linear list: Linear list is simplest to implement but its performance (time required to add n
entries and make e enquiries is more) is poor. We use a single array or equivalently several
arrays to store the linear list of records. New names are added in the list in the order they are
seen. To retrieve information about a name, we search from the beginning of an array up to the
position marked by the pointer available.
Id1
Info1
Id2
Info2

Idn
Infon
Available
Suppose the symbol table contains n names .To insert a name in the symbol table, work done is
proportional to n. This is because first we check existing entry for that name. To find data about name
work done is proportional to n ( on an average n/2 searches are needed). Therefore total work done to
insert n names and make e enquiries, is cn(n+e), where c is a machine dependent constant. Therefore list
data structure for symbol tables is very inefficient.
2. Hash table: There are different hashing techniques. A simple hashing technique called “open
hashing”(In open hashing there is no limit on the no. of entries in the hash table.) can be used for
symbol table.
In basic hashing scheme data structure has following two parts: i. A hash table consisting of a fixed
array of m pointers to table entries. ii. Table entries are organized into m separate linked lists called
buckets.
Array of list headers indexed by hash value

capital b
9

a
15

20

To find an entry for string s in the symbol table, we apply hash function h to string s i.e. h(s) which
returns an integer between 0 and m-1. So if n=h(s), then if s is in the symbol table, it is found on the list
n. If s is not in the symbol table, then it is entered by creating a record for s, which is linked at the front
on the list n.
Therefore total work done to insert n names and make e enquiries, is cn(n+e)/m, where c is a machine
dependent constant.

Representing Scope information: Declaration statements cause entries to be entered into the symbol
table. When a name is used in a program, the symbol table should return symbol table entry made by the
appropriate declaration.
One simple approach is to maintain separate symbol table for each scope. Information for nonlocals of a
procedure is found by scanning the symbol tables for the enclosing procedures.
An alternative approach integrates symbol table with the intermediate code. In this approach
information about locals of a procedure can be attached to the node for the procedure in a syntax tree for
the program.

Symbol table for block structured languages: In a block structured language, blocks or procedures may
be nested in other blocks or procedures (ex. Pascal). A block can access its own data as well as data in
the enclosing blocks unless the same data item is redeclared within the block itself.

B1: begin
Declare X,Y,Z
B2: begin
Declare Y;
B3: begin
…….
End B3;
I=10;
End B2;
………
end B1;

Since the program blocks are properly nested inside one another, symbol table can be structured as a
stack. When a block closes, its entries can be removed from the stack. We use following organization for
the symbol table. Even if the language allows declaration after use of a name, this organization works
well.
Block Enclosing No.of Pointer Identifiers Defined? Attributes
Id block id entries

B1 - 3 X / B1 - 4 X /
B2 B1 2 Y / Y /
Z / Z /
Y / I x
I x

Fig a Fig b

Fig.a. Symbol table before processing the statement END B2


Fig b Symbol table after processing the statement END B2

The above symbol table organization cannot be used for multipass compilation. Because entries for a
block cannot be erased when the block closes, since subsequent passes may need information contained
in these entries. Thus symbol table for multipass compiler cannot be maintained as a stack.
We modify above stack organization with the help of “enclosing block ID” field. This field is actually
redundant in the above organization. In this organization entries are not physically deleted when a block
closes. But due to this all entries for a block will not be always contiguous in the symbol table.
Therefore, all entries belonging to the same block are linked together.

Defined ?

B1 - 4 X /
B2 B1 1 Y /
Z /
Y / -
I x -

Hash table organization:

B1 - 3 I2 x
B2 B1 2
X1 /

Y2 /

Z1 /

Y1 /

One identifier can denote different names. For ex. Y in the given program. Therefore, in the above hash
table organization, it is necessary to concatenate block nos. with the symbol defined in it. When use of
symbol S in block n is seen, we first search for symbol Sn in the symbol table. If not found, we search
for symbol Sm, where m is the no. of block which immediately encloses block n, and so on. In this
organization also we link all entries in a block together, since actions concerning use of undeclared
symbols(i.e. global symbols) are performed on encountering the end of a block. For ex. in the fig. above
I2 can be converted into I1.

You might also like