You are on page 1of 40

COMPILER

CONSTRUCTION
ANUM ALEEM
Lexical Analyzer
Token
Lexems
Patterns
Lexical Analyzer Architecture: How token are recognized
Lexical Analyzer
Roles of the Lexical analyzer
Example of Lexical Analysis, Tokens, Non-Tokens
Examples of Tokens created <role, word>
Lexical Errors
Symbol Table in Compiler Design
& its Attributes
Symbol Table
Symbol Table is an important data structure created and maintained by the compiler in order to
keep track of semantics of variables i.e. it stores information about the scope and binding
information about names, information about instances of various entities such as variable and
function names, classes, objects, etc.
•It is built-in lexical and syntax analysis phases.
•The information is collected by the analysis phases of the compiler and is used by the synthesis
phases of the compiler to generate code.
•It is used by the compiler to achieve compile-time efficiency.
Symbol Table Used by Compiler
Phases
•It is used by various phases of the compiler as follows:-
• Lexical Analysis: Creates new table entries in the table, for example like entries about tokens.
• Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of
reference, use, etc in the table.
• Semantic Analysis: Uses available information in the table to check for semantics i.e. to
verify that expressions and assignments are semantically correct(type checking) and update it
accordingly.
• Intermediate Code generation: Refers symbol table for knowing how much and what type
of run-time is allocated and table helps in adding temporary variable information.
• Code Optimization: Uses information present in the symbol table for machine-dependent
optimization.
• Target Code generation: Generates code by using address information of identifier present
in the table.
Symbol Table
Each entry in the symbol table is associated with attributes that
support the compiler in different phases.
• Variable names and constants
• Procedure and function names
• Literal constants and strings
• Compiler generated temporaries
• Labels in source languages
Symbol Table
Name Type Size Dimension Line of Line of Address Scope
Declaration Usage
Symbol Table
Name Type Size Dimension Line of Line of Address Scope
Declaration Usage

main()
{
int x;
char y[5];
.
.
.
Symbol Table
Name Type Size Dimension Line of Line of Address Scope
Declaration Usage

X Int 2/4 0 3 25 Hexadecima Local


bytes l
? ? ? ? ? ? ? ?

main()
{
int x;
char y[5];
.
.
.
X=5;
Information used by the compiler
from Symbol table:
•Data type and name
•Declaring procedures
•Offset in storage
•If structure or record then, a pointer to structure table.
•For parameters, whether parameter passing by value or by reference
•Number and type of arguments passed to function
•Base Address
Operations of Symbol table
The basic operations defined on a symbol table include:
Possible Implementations Techniques
1.Linear List
2.Binary Search Tree
3.Hash Table
List (Arrays)
It is the simplest and most straightforward method of
implementing data structures. To store names and
their accompanying information, we use a single
array.
•A pointer “available” is maintained at end of all stored
records and new names are added in the order as they arrive
•To search for a name we start from the beginning of the list
till available pointer and if not found we get an error “use
of the undeclared name”
•While inserting a new name we must ensure that it is not
already present otherwise an error occurs i.e. “Multiple
defined names”
•Insertion is fast O(1), but lookup is slow for large tables –
O(n) on average
•The advantage is that it takes a minimum amount of space
List (Arrays)
One advantage of list organization is that simple compilers use the least amount of
space feasible.
The list is broken into two sections:
•Unordered list: Used for a modest number of variables.
•Ordered list: In this, the cost of insertion is high, but the process is straightforward.
Linked List
•This implementation is using a linked list. A link field is added to each record.
•Searching of names is done in order pointed by the link of the link field.
•A pointer “First” is maintained to point to the first record of the symbol table.
•Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
Trees (Binary Search Tree)
It is a more efficient method of organizing symbol tables.
Each record now has two link fields, LEFT and RIGHT. The procedure below is
used to find NAME in a binary search tree where x is originally a reference to
the root.
All names are created as child of the root node that always follows the property
of the binary search tree.
Hash Table
Hashing table technique is suitable for searching and hence it is implemented in compiler.
•In hashing scheme, two tables are maintained – a hash table and symbol table and are the most
commonly used method to implement symbol tables.
•A hash table is an array with an index range: 0 to table size – 1. These entries are pointers
pointing to the names of the symbol table.
•To search for a name we use a hash function that will result in an integer between 0 to table
size – 1.
•Insertion and lookup can be made very fast – O(1).
•The advantage is quick to search is possible and the disadvantage is that hashing is
complicated to implement.
•We use NAME as a hash function h such that h(NAME) is an integer between 0 and size -1 to
determine if NAME is in the symbol table.
Why Hashing?
Increased content especially internet
Impossible to find anything, unless new data structures and algorithms for storing and accessing
data are developed.
Problem with traditional data structures like Arrays and Linked Lists?
◦ Sorted array ->Binary search -> time complexity =O(log n)
◦ Unsorted array -> Linear search -> time complexity = O(n)
◦ Either case may not be desirable if we need to process a very large data set.

A new technique called hashing that allows us to update and retrieve any entry in constant time
O(1). The constant time or O(1) performance means, the amount of time to perform the
operation does not depend on data size n.
Hash Table
Hash function
Problems:
Number of possible keys is much larger than the space available
in table.
Keys may not be numeric.
Different keys may map into same location
◦ Hash function is not one-to-one => collision.
◦ If there are too many collisions, the performance of the hash table will
suffer dramatically.

22
Hash function examples for integer keys
Truncation: If students have an 9-digit identification number, take the last 3 digits as the table position
◦ E.g. 925371622 becomes 622

Folding: Split a 9-digit number into three 3-digit numbers, and add them
◦ E.g. 925371622 becomes 925 + 376 + 622 = 1923

Modular arithmetic: If the table size is 1000, the first example always keeps within the table range, but
the second example does not (it should be mod 1000)
◦ E.g. 1923 mod 1000 = 923 (1923 % 1000)

23
Hash Function
The hash function is the mapping between an item and the place in the hash table
where that item resides. The hash function takes a collection item and returns an integer
in the range of slot names from 0 to m-1.
Example:
Assume that we have the set of integer items 59, 28, 93, 18, 77 and 31.
Consider m=11 (number of slots of hash table)
Hash Function (cont..)
Suppose we want to insert the above elements within the slots by using formula:
h(item) mod m
59 mod 11 = 4
28 mod 11 = 6
93 mod 11 = 5
18 mod 11 = 7
77 mod 11 = 0
31 mod 11 = 9
After inserting above elements the hash table looks as shown below:
Collision Resolution
If, when an element is inserted, it hashes to the same value as an
already inserted element, then we have a collision and need to
resolve it.
There are several methods for dealing with this:
◦ Separate chaining
◦ Open addressing
◦ Linear Probing
◦ Quadratic Probing
◦ Double Hashing

26
Collision Resolution Techniques (CRT)
Linear
Probing

Open Quadratic
Addressing Probing

Double
CRT
Hashing

Close Separate List


Addressing Chaining
Separate Chaining
The idea is to keep a list of all elements that hash to the same
value.
◦ The array elements are pointers to the first nodes of the lists.
◦ A new item is inserted to the front of the list.

Advantages:
◦ Better space utilization for large items.
◦ Simple collision handling: searching linked list.
◦ Overflow: we can store more items than the hash table size.
◦ Deletion is quick and easy: deletion from the linked list.

30
Example
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10.
0 0

1 81 1
2

4 64 4
5 25
6 36 16
7

9 49 9

31
Example
table[0] Adams Arvin

table[1]

table[2] Chris

table[25]

32
Operations
Initialization:
◦ All entries are set to NULL
Search:
◦ Locate the cell using hash function.
◦ Sequential search on the linked list in that cell.

Insertion:
◦ Locate the cell using hash function.
◦ (If the item does not exist) insert it as the first item in the list.

Deletion:
◦ Locate the cell using hash function.
◦ Delete the item from the linked list.
33
Open addressing
Separate chaining has the disadvantage of using linked lists.
◦ Requires the implementation of a second data structure.

In an open addressing hashing system, all the data go inside the table.

If a collision occurs, alternative cells are tried until an empty cell is found.

34
Open Addressing
There are three common collision resolution strategies:
◦ Linear Probing
◦ Quadratic probing
◦ Double hashing

35
Linear Probing
In linear probing, collisions are resolved by sequentially scanning an array (with wraparound) until
an empty cell is found.

Thus, once 77 collides with 52 at location 2, we simply put 77


in position 3.

Hash Table 500 -1 52 77 129 . . . . . 49


…………..
0 1 2 3 23 24

36
Linear Probing
Hash Table 500 -1 52 77 129 102 . . . . 49
…………..
0 1 2 3 23 24

To insert 102, we follow the probe sequence consisting of


locations 2,3,4, and 5 to find the first available locations and
thus store 102 in table[5].

Note: If the search reaches the end of the table, we continue


at first location.

37
Linear Probing
To determine if a specified value is in the hash table, we first apply the hash function to compute
the position at which this value should be found.

There can by one of the following cases:


◦ The location is empty
◦ The location contains the specified value
◦ The location contains some other value
◦ Begin a circular linear search until either the item is found or we reach an empty location or the starting location.

38
Linear Probing -- Example
Example:
◦ Table Size is 11 (0..10) 0 9
◦ Hash Function: h(x) = x mod 11 1
◦ Insert keys: 20, 30, 2, 13, 25, 24, 10, 9 2 2
◦ 20 mod 11 = 9 3 13
◦ 30 mod 11 = 8 4 25
◦ 2 mod 11 = 2 5 24
◦ 13 mod 11 = 2  2+1=3 6
◦ 25 mod 11 = 3  3+1=4
7
◦ 24 mod 11 = 2  2+1, 2+2, 2+3=5
◦ 8 30
10 mod 11 = 10
◦ 9 mod 11 = 9  9+1, 9+2 mod 11 =0 9 20
10 10
39
Linear Probing -- Clustering Problem

One of the problems with linear probing is that table items tend to cluster together in the hash
table.
◦ i.e. table contains groups of consecutively occupied locations.

This phenomenon is called primary clustering.


◦ Clusters can get close to one another, and merge into a larger cluster.
◦ Thus, the one part of the table might be quite dense, even though another part has relatively
few items.
◦ Primary clustering causes long probe searches, and therefore, decreases the overall efficiency.

40
Clustering Problem
As long as table is big enough, a free cell can always be found, but the time to do so can get quite
large

Larger table size are preferred

Studies suggest the use of tables whose capacities are approx. 1.5 to 2 times the number of items that
must be stored

41
Access Time of the Symbol Table
Data Structure Insertion time Lookup time

Unordered Array O(1) O(n)

Ordered Array O(n) O(log n)

Unordered Linked List O(1) O(n)

Ordered Lined List O(n) O(n)

Search Tree O(log n) O(log n)

Hash Table O(1) O(1)

You might also like