Professional Documents
Culture Documents
Tsan-sheng Hsu
tshsu@iis.sinica.edu.tw
http://www.iis.sinica.edu.tw/~tshsu
1
Denitions
Symbol table: A data structure used by a compiler to keep track of semantics of variables.
Data type. When is used:
scope.
Possible implementations:
Unordered list: for a very small set of variables. Ordered linear list: insertion is expensive, but implementation is
relatively easy. Binary search tree: O(log n) time per operation for n variables. Hash table: most commonly used, and very ecient provided the memory space is adequately larger than the number of variables.
Hash Table
Hash function h(n): returns a value from 0, . . . , m 1, where n is the input name and m is the hash table size.
Uniform and randomized.
remainder of it divided by m. Add up a linear combination of integer values of characters in a name, and then
Resolving collisions:
Linear resolution: try (h(n) + 1) mod m for m being a prime number. Chaining.
Open hashing. Keep a chain on the items with the same hash value. Most popular.
Quadratic-rehashing:
try (h(n) + 12) mod m, and then try (h(n) + 22) mod m, . . . , try (h(n) + i2) mod m.
Uniformly distributed.
Data type. Scope information: where it can be used. Storage allocation, size, . . .
Too little: names must be short. Too much: waste a lot of spaces.
NAME a r
ATTRIBUTES a y
s a r i
o e 2
r a
t d r
Variable-length name:
A string of space is used to store all names. For each name, store the length and starting index of each name.
NAME index length 0 5 5 2 7 10 17 3 ATTRIBUTES
0 s
1 o
2 r
3 t
4 $
5 a
6 $
7 r
8 e
9 a
10 d
11 a
12 r
13 r
14 a
15 y
16 $
17 i
18 2
19 $
Handling block-structures
Nested block means nested scope. Example (C language code)
main() { /* open a new scope */ int H,A,L; /* parse point A */ ... { /* open another new scope */ float x,y,H; /* parse point B */ ... /* x and y can only be used here */ /* H used here is float */ ... } /* close an old scope */ ... /* H used here is integer */ ... { char A,C,M; /* parse point C */ ... } }
Use a stack to maintain the current scope. Search top of stack rst. If not found, search the next one in the stack. Use the rst match. Note: a popped scope can be destroyed in a one pass compiler, but it must be saved in a multi-pass compiler.
main() { /* open a new scope */ int H,A,L; /* parse point A */ ... { /* open another new scope */ float x,y,H; /* parse point B */ ... /* x and y can only be used here */ /* H used here is float */ ... } /* close an old scope */ S.T. for ... H, A, L /* H used here is integer */ ... { char A,C,M; /* parse point C */ ... parse point A } }
searching direction
parse point B
parse point C
parse point C
10
Example (PASCAL code): A, R: record A: integer X: record A: real; C: boolean; end end ... R.A := 3; /* means R.A := 3; */ with R do A := 4; /* means R.A := 4; */
Compiler notes #5, Tsan-sheng Hsu, IIS 11
boolean
Associate a record number within the eld names. Assign record number #0 to names that are not in records. A bit time consuming in searching the symbol table. Similar to the scope numbering technique.
Compiler notes #5, Tsan-sheng Hsu, IIS 12
number and continue to search. If everything fails, search the normal main symbol table.
13
Overloading (1/3)
A symbol may, depending on context, mean more than one thing. Example:
operators: I := I + 3; X := Y + 1.2; function call return value and recursive function call: f := f + 1;
14
Overloading (2/3)
Implementation:
Link together all possible denitions of an overloading name. Call this an
15
Overloading (3/3)
Example: PASCAL function name and return variable.
Within the function body, the two denitions are chained. i.e., function call and return variable. When the function body is closed, the return variable denition disap-
pears.
[PASCAL] function f: integer; begin if global > 1 then f := f +1; return end
16
GOTO labels:
If labels must be dened before its usage, then one-pass compiler
17
the type declaration; all type names can then be looked up and resolved. [PASCAL] type link = ^ cell; cell = record info: integer; next: link; end;
18
Structural equivalence:
Express a type denitions using a directed graph using nodes as entries. Two types are equivalent if and only if their structures (graphs) are the same. A dicult job for compilers.
Name equivalence:
Two types are equivalent if and only if their names are the same. An easy job for compilers, but the coding takes more time.
Symbol table is needed during compilation, might also be needed during debugging.
19
How to use?
Dene symbol tbale routines:
Find in symbol table(name,scope):
return not found or an entry in the symbol table
check whether a name within a particular scope is currently in the symbol table or not.
Insert into sumbol table(name,scope) Return the newly created entry. Delete from sumbol table(name,scope)
Grammar productions:
Declaration: D T L { insert each name in $2.namelist into symbol table, allocate sizeof ($1.type) bytes, error for duplicated names} T int{$$.type = int} L id, L {insert the new name into $3.namelist and put it in $$.namelist} | id {create a list of one name $$.namelist} Allocate global and temperatory data space at the end of code. P program end {printf(GDATA:\n); printf(nbytes %d\n,total Gsize); printf(TDATA:\n); printf(nbytes %d\n,total Tsize); }
Compiler notes #5, Tsan-sheng Hsu, IIS 20
21