You are on page 1of 13

SCHOOL OF INFORMATION AND

COMMUNICATION TECHNOLOGY

COMPILER DESIGN LAB


AI 383

NAME- ANUSHKA SRIVASTAVA


ROLL NO- 215/UAI/031
BRANCH- B.TECH AI
SEM-5th
INDEX

S.no Program Date Signature


1. Practice of LEX/YACC of Compiler writing.

2. Write a program to check whether a string belongs


to grammar or not.

3. Write a program to a generate a parse tree.

4. Write a program to find leading terminals.

5. Write a program to find trailing terminals.

6. Write a program to compute FIRST of


non-terminals.

7. Write a program to compute FOLLOW of


non-terminals.
1

1. Practice of LEX/YACC of Compiler writing.

Introduction-

Some of the most time consuming and tedious parts of writing a compiler involve the
lexical scanning and syntax analysis. Luckily there is freely available software to assist in
these functions. While they will not do everything for you, they will enable faster
implementation of the basic functions. Lex and Yacc are the most commonly used
packages with Lex managing the token recognition and Yacc handling the syntax. They
work well together, but conceivably can be used individually as well.
Both operate in a similar manner in which instructions for token recognition or grammar
are written in a special file format. The text files are then read by lex and/or yacc to
produce c code. This resulting source code is compiled to make the final application. In
practice the lexical instruction file has a “.l” suffix and the grammar file has a “.y” suffix.

LEX-
The file format for a lex file consists of (4) basic sections
•The first is an area for c code that will be place verbatim at the beginning of the generated
source code. Typically is will be used for things like #include, #defines, and variable
declarations.
• The next section is for definitions of token types to be recognized. These are not
mandatory, but in general makes the next section easier to read and shorter.
• The third section set the pattern for each token that is to be recognized, and can also
include c code to be called when that token is identified
• The last section is for more c code (generally subroutines) that will be appended to the
end of the generated c code. This would typically include a main function if lex is to be
used by itself.
• The format is applied as follows (the use and placement of the % symbols are
necessary):
%{
//header c code
%}
//definitions
%%
//rules
%%
//subroutines

YACC-
The format for a yacc file is similar, but includes a few extras.
• The first area (preceded by a %token) is a list of terminal symbols. You do not need to
list single character ASCII symbols, but anything else including multiple ASCII symbols
need to be in this list (i.e. “==”).
• The next is an area for c code that will be place verbatim at the beginning of the
generated source code. Typically is will be used for things like #include, #defines, and
variable declarations.
• The next section is for definitions - none of the following examples utilize this area
• The fourth section set the pattern for each token that is to be recognized, and can also
include c code to be called when that token is identified
• The last section is for more c code (generally subroutines) that will be appended to the
end of the generated c code. This would typically include a main function if lex is to be
used by itself.
• The format is applied as follows (the use and placement of the % symbols are
necessary):
2

%tokens RESERVED, WORDS, GO, HERE


%{
//header c code
%}
//definitions
%%
//rules
%%
//subroutines

These formats and general usage will be covered in greater detail in the following (4)
sections. In general it is best not to modify the resulting c code as it is overwritten each
time lex or yacc is run. Most desired functionality can be handled within the lexical and
grammar files, but there are some things that are difficult to achieve that may require
editing of the c file.
As a side note, the functionality of these programs has been duplicated by the GNU open
source projects Flex and Bison. These can be used interchangeably with Lex and Yacc for
everything this document will cover and most other uses as well.
3

2. Write a program to check whether a string belongs to grammar or not.

def is_valid_string(input_string):
stack = []
for char in input_string:
if char == 'a':
stack.append('a')
elif char == 'b':
if not stack:
return False # 'b' without corresponding 'a'
stack.pop()
else:
return False # Invalid character

return not stack # True if stack is empty (equal 'a' and 'b' counts)

# Test the program


input_string = input("Enter a string: ")
if is_valid_string(input_string):
print("The string belongs to the grammar.")
else:
print("The string does not belong to the grammar.")

Output-
4

3. Write a program to generate a parse tree.

class Node:
def __init__(self, value):
self.value = value
self.children = []

def generate_parse_tree(expression):
tokens = expression.replace(" ", "") # Remove spaces
root = Node("Expression")
current_node = root
stack = [root]

for token in tokens:


if token.isdigit():
current_node.children.append(Node("Number: " + token))
elif token in ['+', '*']:
current_node.children.append(Node("Operator: " + token))
new_node = Node("Expression")
current_node.children.append(new_node)
stack.append(new_node)
current_node = new_node
elif token == '(':
new_node = Node("Expression")
current_node.children.append(new_node)
stack.append(new_node)
current_node = new_node
elif token == ')':
stack.pop()
if stack:
current_node = stack[-1]

return root

def print_parse_tree(node, depth=0):


if node is not None:
print(" " * depth + node.value)
for child in node.children:
print_parse_tree(child, depth + 1)

# Example usage:
expression = "3 + 4 * (5 + 2)"
parse_tree = generate_parse_tree(expression)
print_parse_tree(parse_tree)
5

OUTPUT-
6

4. Write a program to find leading terminals.

def find_leading_terminals(grammar):
leading_terminals = {}

for non_terminal in grammar:


leading_terminals[non_terminal] = set()

for non_terminal, productions in grammar.items():


for production in productions:
first_symbol = production[0]
if first_symbol.islower(): # Terminal symbol
leading_terminals[non_terminal].add(first_symbol)
elif first_symbol in grammar: # Non-terminal symbol
leading_terminals[non_terminal] |=
leading_terminals[first_symbol]

return leading_terminals

# Example usage:
# Grammar: S -> Ab | Bc | d, A -> a, B -> b
grammar = {
'S': ['Ab', 'Bc', 'd'],
'A': ['a'],
'B': ['b']
}

leading_terminals = find_leading_terminals(grammar)

print("Leading Terminals:")
for non_terminal, terminals in leading_terminals.items():
print(f"{non_terminal}: {terminals}")

OUTPUT-
7

5. Write a program to find trailing terminals

def find_trailing_terminals(grammar):
trailing_terminals = {}

for non_terminal in grammar:


trailing_terminals[non_terminal] = set()

for non_terminal, productions in grammar.items():


for production in productions:
last_symbol = production[-1]
if last_symbol.islower(): # Terminal symbol
trailing_terminals[non_terminal].add(last_symbol)
elif last_symbol in grammar: # Non-terminal symbol
trailing_terminals[non_terminal] |=
trailing_terminals[last_symbol]

return trailing_terminals

# Example usage:
# Grammar: S -> aA | bB | c, A -> d, B -> e
grammar = {
'S': ['aA', 'bB', 'c'],
'A': ['d'],
'B': ['e']
}

trailing_terminals = find_trailing_terminals(grammar)

print("Trailing Terminals:")
for non_terminal, terminals in trailing_terminals.items():
print(f"{non_terminal}: {terminals}")

OUTPUT-
8

6. Write a program to compute FIRST of non-terminals

def compute_first_sets(grammar):
first_sets = {}

for non_terminal in grammar:


first_sets[non_terminal] = set()

for non_terminal, productions in grammar.items():


for production in productions:
first_symbol = production[0]
if first_symbol.islower(): # Terminal symbol
first_sets[non_terminal].add(first_symbol)
elif first_symbol in grammar: # Non-terminal symbol
first_sets[non_terminal] |=
compute_first_sets_helper(grammar, first_symbol)

return first_sets

def compute_first_sets_helper(grammar, symbol):


first_set = set()

for production in grammar[symbol]:


first_symbol = production[0]
if first_symbol.islower(): # Terminal symbol
first_set.add(first_symbol)
elif first_symbol in grammar: # Non-terminal symbol
first_set |= compute_first_sets_helper(grammar,
first_symbol)

return first_set

# Example usage:
# Grammar: S -> Ab | Bc | d, A -> a, B -> b
grammar = {
'S': ['Ab’, 'Bc', 'd'],
'A': ['a'],
'B': ['b']
}

first_sets = compute_first_sets(grammar)

print("FIRST Sets:")
for non_terminal, first_set in first_sets.items():
print(f"{non_terminal}: {first_set}")
9

OUTPUT-
10

7. Write a program to compute the FOLLOW of non-terminals.

def compute_follow_sets(grammar, start_symbol):


follow_sets = {non_terminal: set() for non_terminal in grammar}

# Add '$' (end-of-input marker) to the follow set of the start


symbol
follow_sets[start_symbol].add('$')

while True:
prev_follow_sets = {non_terminal: set(follow_set) for
non_terminal, follow_set in follow_sets.items()}

for non_terminal, productions in grammar.items():


for production in productions:
for i, symbol in enumerate(production):
if symbol in grammar:
remaining_symbols = production[i + 1:]
first_of_remaining =
compute_first_of_string(grammar, remaining_symbols)

if '' in first_of_remaining:
first_of_remaining.remove('') # Remove
epsilon

follow_sets[symbol] |= first_of_remaining

if '' in first_of_remaining or all(x in


first_of_remaining for x in first_of_remaining):
follow_sets[symbol] |=
follow_sets[non_terminal]

# Check if follow sets have converged


if follow_sets == prev_follow_sets:
break

return follow_sets

def compute_first_of_string(grammar, symbols):


first_set = set()

for symbol in symbols:


if symbol.islower(): # Terminal symbol
first_set.add(symbol)
break
elif symbol in grammar: # Non-terminal symbol
first_set |= compute_first_of_string_helper(grammar, symbol)
if '' not in first_set:
break
11

else:
break

return first_set

def compute_first_of_string_helper(grammar, symbol):


first_set = set()

for production in grammar[symbol]:


first_symbol = production[0]
if first_symbol.islower(): # Terminal symbol
first_set.add(first_symbol)
elif first_symbol in grammar: # Non-terminal symbol
first_set |= compute_first_of_string_helper(grammar,
first_symbol)

return first_set

# Example usage:
# Grammar: S -> Ab | Bc | d, A -> a, B -> b
grammar = {
'S': ['Ab', 'Bc', 'd'],
'A': ['a'],
'B': ['b']
}

start_symbol = 'S'

follow_sets = compute_follow_sets(grammar, start_symbol)

print("FOLLOW Sets:")
for non_terminal, follow_set in follow_sets.items():
print(f"{non_terminal}: {follow_set}")

OUTPUT-

You might also like