You are on page 1of 12

The Symbol Table

used during all phases of compilation

maintains information about many source language constructs

incrementally constructed and expanded during the analysis phases used directly in the code generation phases

efficient storage and access important in practice (but we wont worry about efficiency - well just use a liked list) may or may not be constructed during lexical and syntax analysis, depending on the compiler

Constructing the Symbol Table

There are three main operations to be carried out on the symbol table: determining whether a string has already been stored inserting an entry for a string deleting a string when it goes out of scope

This requires three functions: lookup(s): returns the index of the entry for string s, or 0 if there is no entry insert(s,t): add a new entry for string s (of token t), and return its index delete(s): deletes s from the table (or, typically, hides it)

A simple implementation
index next token atts strPtr 7 ID_T attribute structure position in string array next node

Initial node Table: first length 78 last

1 ID_T

2 ID_T

... ...

...

78 ID_T

...

...

c o u n t # i # ...

n a m e # ...

Declarations
There are four kinds of entity that may require an entry in the symbol table: Constant e.g. const int MAX = 10000; Variable e.g. int count, marks[100]; type (user-defined) e.g. struct Entry { int index; char *strPtr; }; Function e.g. int gcd(int n, int m) { if (m == 0) return n; else return gcd(m, n % m); } The attributes represented in the table will depend on the object being declared. All four will typically have a type signature, representing the data type or (for functions) the return type.  Constants may have value bindings.  Variables may have pointers to memory locations.  Functions may have a pointer to code segments.  All four may have scope information. In some compilers, separate symbols tables are used for each different kind of declaration; in others, each separate region of the program (e.g. functions) may be given a separate table.

Scope

In most high-level languages, variables and functions have restricted scope - i.e. they can only be accessed in specific areas of the source code. The scope of any particular variable may be global, or within a specific code file, or in a file after its declaration, or within specific code blocks. In C, blocks are files, function declarations and compound statements (between "{" and "}"). Also, structures and unions can be considered to be blocks.

In languages with restrictive scoping rules, it is possible to construct the symbol table during lexical analysis.

{L}+

{entry = lookup(yytext); if (entry == -1) /* i.e. new ID_T */ insert(yytext,ID_T); }

Scoping Rules
In block structured languages, the same variable name can be used in different places to refer to different objects. We now cannot simply look to see if the name has already been entered in the table, as the current use may be a new declaration.

int i; int f1(int k) { int j; ... print i; } int f2() { int j; ... }

i is globally accessible a new integer k, in f1 only a new integer j, in f1 only (the global variable)

a different j, in f2 only

Scope and the Symbol Table

In languages with nested scope, the symbol table functions are more complex.

lookup must must find the most recently inserted declaration; i.e. search for a declaration of the identifier valid in the current scope. insert must not overwrite previous declarations, but make them inaccessible. delete should hide the most recent the most recent declaration and uncover the previous one. Symbol table should thus behave as a stack It is still possible to construct the symbol table during the first pass of the compiler if explicit nesting levels are associated with each entry in the table. Many compilers prefer to make multiple passes over the source code, first constructing a syntax tree, and then constructing the table once the nested structure of the code is known.

One-pass symbol table construction


One possible method of constructing the symbol table during the first pass is shown below.

Prog Prog Dec Dec VDec FDec SFDec Par Par Par PList PList

-> Dec Prog -> Main -> VDec ; -> FDec -> int id -> SFDec Par ) { CStat } -> int id ( -> -> VDec -> PList , VDec -> VDec -> VDec , PList

decr(stack); incr(stack);

{L}+

{entry = lookup(yytext,stack); if (entry == -1) insert(yytext,ID_T,stack); }

The stack consists of entries of the form (nesting level, scope value) These are extra entries added to the symbol table

The last index is the index of the last entry added to the symbol table Initially, the stack is set to < (0,0) > and last to 0. insert associates the top of the stack with the entry lookup searches for a matching entry, and obtains its nesting level. It moves down the stack until it finds a stack entry with the same nesting level. If the table index is less than the stack scope value, it ignores it, and continues searching the table. If no match is found, it returns -1. decr deletes the top element of the stack incr adds a new element to the top of the stack, increments the nesting level, and assigns the last index as the scope value.

Prog Dec VDec ; i nt i d i Dec FDec 2 1 SFDec Par ) i nt id ( f1 VDec i nt i d k { CStat } VDec ; i nt i d j pri nt ( i d ) ; i ... i nt Prog Prog Dec FDec 4 3 SFDec Par ) id ( f2 P { CStat } VDec ; i nt i d j ... ...

Index 0 1 2 3 4 5

Str i f1 k j f2 j

Nest 0 0 1 1 0 1

Scope 0 0 1 1 0 4

Atts ...

The changes in the stack are as follows (top on the right): Event Last Stack (Nest,Scope) 0 (0,0) 1 1 (0,0), (1,1) 2 3 (0,0) 3 4 (0,0), (1,4) 4 5 (0,0)o

constructed symbol table


int i; int f1(int k) { int j; ... print i; } int f2() { int j; ... } Index 0 1 2 3 4 5 Str i f1 k j f2 j Nest 0 0 1 1 0 1 Scope 0 0 1 1 0 4 Atts ...

The changes in the stack are as follows (top on the right): Event Last Stack (Nest,Scope) 0 (0,0) 1 1 (0,0), (1,1) 2 3 (0,0) 3 4 (0,0), (1,4) 4 5 (0,0)o

Syntax trees and scope


Prog

VDec

func

func

int

id i

int

id f1

VDec int id k

VDec int id j

print id i

int

id P f2

VDec int id j

Many compilers simply build a syntax tree on the first pass (while carrying out lexical and syntax analysis). They then make a second pass, constructing the symbol table, checking data types, etc. It should be easier to determine the scope of the identifiers from the syntax tree. Multiple passes may be slower, but it can result in more natural grammars, and simpler translation and analysis routines.

You might also like