14 Symbol Table

The Symbol Table
used during all phases of compilation
maintains information about many source language constructs
incrementally constructed and expanded during the analysis phases used directly in the code generation phases
efficient storage and access important in practice (but we wont worry about efficiency - well just use a liked list) may or may not be constructed during lexical and syntax analysis, depending on the compiler
Constructing the Symbol Table
There are three main operations to be carried out on the symbol table: determining whether a string has already been stored inserting an entry for a string deleting a string when it goes out of scope
This requires three functions: lookup(s): returns the index of the entry for string s, or 0 if there is no entry insert(s,t): add a new entry for string s (of token t), and return its index delete(s): deletes s from the table (or, typically, hides it)
A simple implementation
index next token atts strPtr 7 ID_T attribute structure position in string array next node
Initial node Table: first length 78 last
1 ID_T
2 ID_T
... ...
...
78 ID_T
...
...
c o u n t # i # ...
n a m e # ...
Declarations
There are four kinds of entity that may require an entry in the symbol table: Constant e.g. const int MAX = 10000; Variable e.g. int count, marks[100]; type (user-defined) e.g. struct Entry { int index; char *strPtr; }; Function e.g. int gcd(int n, int m) { if (m == 0) return n; else return gcd(m, n % m); } The attributes represented in the table will depend on the object being declared. All four will typically have a type signature, representing the data type or (for functions) the return type. Constants may have value bindings. Variables may have pointers to memory locations. Functions may have a pointer to code segments. All four may have scope information. In some compilers, separate symbols tables are used for each different kind of declaration; in others, each separate region of the program (e.g. functions) may be given a separate table.
Scope
In most high-level languages, variables and functions have restricted scope - i.e. they can only be accessed in specific areas of the source code. The scope of any particular variable may be global, or within a specific code file, or in a file after its declaration, or within specific code blocks. In C, blocks are files, function declarations and compound statements (between "{" and "}"). Also, structures and unions can be considered to be blocks.
In languages with restrictive scoping rules, it is possible to construct the symbol table during lexical analysis.
{L}+
{entry = lookup(yytext); if (entry == -1) /* i.e. new ID_T */ insert(yytext,ID_T); }
Scoping Rules
In block structured languages, the same variable name can be used in different places to refer to different objects. We now cannot simply look to see if the name has already been entered in the table, as the current use may be a new declaration.
int i; int f1(int k) { int j; ... print i; } int f2() { int j; ... }
i is globally accessible a new integer k, in f1 only a new integer j, in f1 only (the global variable)
a different j, in f2 only
Scope and the Symbol Table
In languages with nested scope, the symbol table functions are more complex.
lookup must must find the most recently inserted declaration; i.e. search for a declaration of the identifier valid in the current scope. insert must not overwrite previous declarations, but make them inaccessible. delete should hide the most recent the most recent declaration and uncover the previous one. Symbol table should thus behave as a stack It is still possible to construct the symbol table during the first pass of the compiler if explicit nesting levels are associated with each entry in the table. Many compilers prefer to make multiple passes over the source code, first constructing a syntax tree, and then constructing the table once the nested structure of the code is known.
One-pass symbol table construction

One possible method of constructing the symbol table during the first pass is shown below.
Prog Prog Dec Dec VDec FDec SFDec Par Par Par PList PList
-> Dec Prog -> Main -> VDec ; -> FDec -> int id -> SFDec Par ) { CStat } -> int id ( -> -> VDec -> PList , VDec -> VDec -> VDec , PList
decr(stack); incr(stack);
{L}+
{entry = lookup(yytext,stack); if (entry == -1) insert(yytext,ID_T,stack); }
The stack consists of entries of the form (nesting level, scope value) These are extra entries added to the symbol table
The last index is the index of the last entry added to the symbol table Initially, the stack is set to < (0,0) > and last to 0. insert associates the top of the stack with the entry lookup searches for a matching entry, and obtains its nesting level. It moves down the stack until it finds a stack entry with the same nesting level. If the table index is less than the stack scope value, it ignores it, and continues searching the table. If no match is found, it returns -1. decr deletes the top element of the stack incr adds a new element to the top of the stack, increments the nesting level, and assigns the last index as the scope value.
Prog Dec VDec ; i nt i d i Dec FDec 2 1 SFDec Par ) i nt id ( f1 VDec i nt i d k { CStat } VDec ; i nt i d j pri nt ( i d ) ; i ... i nt Prog Prog Dec FDec 4 3 SFDec Par ) id ( f2 P { CStat } VDec ; i nt i d j ... ...
Index 0 1 2 3 4 5
Str i f1 k j f2 j
Nest 0 0 1 1 0 1
Scope 0 0 1 1 0 4
Atts ...
The changes in the stack are as follows (top on the right): Event Last Stack (Nest,Scope) 0 (0,0) 1 1 (0,0), (1,1) 2 3 (0,0) 3 4 (0,0), (1,4) 4 5 (0,0)o
constructed symbol table

int i; int f1(int k) { int j; ... print i; } int f2() { int j; ... } Index 0 1 2 3 4 5 Str i f1 k j f2 j Nest 0 0 1 1 0 1 Scope 0 0 1 1 0 4 Atts ...
The changes in the stack are as follows (top on the right): Event Last Stack (Nest,Scope) 0 (0,0) 1 1 (0,0), (1,1) 2 3 (0,0) 3 4 (0,0), (1,4) 4 5 (0,0)o
Syntax trees and scope

Prog
VDec
func
func
int
id i
int
id f1
VDec int id k
VDec int id j
print id i
int
id P f2
VDec int id j
Many compilers simply build a syntax tree on the first pass (while carrying out lexical and syntax analysis). They then make a second pass, constructing the symbol table, checking data types, etc. It should be easier to determine the scope of the identifiers from the syntax tree. Multiple passes may be slower, but it can result in more natural grammars, and simpler translation and analysis routines.

14 Symbol Table

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

14 Symbol Table

Uploaded by

Copyright:

Available Formats

The Symbol Table

used during all phases of compilation

maintains information about many source language constructs

Constructing the Symbol Table

Initial node Table: first length 78 last

{entry = lookup(yytext); if (entry == -1) /* i.e. new ID_T */ insert(yytext,ID_T); }

Scope and the Symbol Table

One-pass symbol table construction

{entry = lookup(yytext,stack); if (entry == -1) insert(yytext,ID_T,stack); }

constructed symbol table

Syntax trees and scope

You might also like