You are on page 1of 10

Compiler Construction

Compiler
2022

TOOLKIT
PROPOSAL

SUBMITTED BY: SUBMITTED TO:


ALEEZA ANJUM (20-CS-101) DR. ABID RAUF
MOAZ ALI (20-CS-69)
This Project proposal is submitted to the University of Engineering & Technology Taxila
in Partial Fulfillment of the Requirements for the Award of the Degree of Bachelor of
Science in Computer Science.
DECLARATION
I ………………………………………………………………………….., hereby declare
that everything proposed in this project proposal is based on my own knowledge and
research carried out with exception.

SIGNATURE………………………………………DATE
……………………………………
APPROVAL
This project proposal has been presented for examination with my approval as the
supervisor.

Supervisor: Dr. Abid Rauf

SIGNATURE………………………………………DATE ……………………………
ABSTRACT:

In last decades the applications of the computerized system were widely used in various
environments, such real time systems, monitoring system and other. These applications
need live answer from the programmable system. The compiler phases represent the
heart of any programming language, therefore if we enhance the compilers; we make the
execution more efficient. In this paper we present Compiler Toolkit that solves problems
in grammar like left factoring, left recursion and string validation of recursive descent
parser. Also this model finds first, follow of input grammar, string validation of SLR
parser, canonical collection of LR(0) items and table of SLR parser. This representation
is more efficient to use which is actually calculating on run time.

INTRODUCTION

A compiler is a language translator that takes as input a program written in high level
language and produces an equivalent program in a low level language. .In the process of
translation, a compiler goes through several phases:
· Lexical Analysis (also called Scanning)
· Syntax Analysis (also called Parsing)
· Semantic Analysis (also called Type Checking)
· Intermediate Code Generation
· Code Optimization
· Code Generation

Lexical Analysis is the very first phase in the compiler designing. A Lexer takes the
modified source code which is written in the form of sentences. In other words, it helps
you to convert a sequence of characters into a sequence of tokens. The lexical analyzer
breaks this syntax into a series of tokens. It removes any extra space or comment written
in the source code.
If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer
works closely with the syntax analyzer. It reads character streams from the source code,
checks for legal tokens, and passes the data to the syntax analyzer when it demands.

Left Recursion: A Grammar G (V, T, P, S) is left recursive if it has a production in


the form.
A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first
position on the right side of production. It can eliminate left recursion by replacing a pair
of production with
A → βA′
A → αA′|ϵ
Left factoring is removing the common left factor that appears in two productions
of the same non-terminal. It is done to avoid back-tracing by the parser. Suppose the
parser has a look-ahead, consider this example:

A -> qB | qC

where A, B and C are non-terminals and q is a sentence.

In this case, the parser will be confused as to which of the two productions to choose and
it might have to back-trace. After left factoring, the grammar is converted to:

A -> qD
D -> B | C
Recursive Descent Parser is a kind of Top-Down Parser. A top-down parser
builds the parse tree from the top to down, starting with the start non-terminal.
A Predictive Parser is a special case of Recursive Descent Parser, where no Back
Tracking is required. 
By carefully writing a grammar means eliminating left recursion and left factoring from
it, the resulting grammar will be a grammar that can be parsed by a recursive descent
parser.

FIRST and FOLLOW are two functions associated with grammar that help us
fill in the entries of an M-table.
FIRST () - It is a function that gives the set of terminals that begin the strings derived
from the production rule.
A symbol c is in FIRST (α) if and only if α ⇒ cβ for some sequence β of grammar
symbols.
Follow (A) is defined as the collection of terminal symbols that occur directly to the right
of A.
FOLLOW(A) = {a|S ⇒* αAaβ where α, β can be any strings}

String Validation in SLR parser


SLR represents "Simple LR Parser". It is very simple and economical to execute. But it
fails to make a parsing table for some class of grammars, i.e., why CLR and LALR are used
which implements mainly all class or type of grammars. It constructs parsing tables which
helps to perform parsing of input strings.
SLR(1) − A grammar having an SLR parsing table is said to be SLR (1).
Canonical Collection of LR(0) items
An LR (0) item is a production G with dot at some position on the right side of the
production.
LR(0) items is useful to indicate that how much of the input has been scanned up to a given
point in the process of parsing.
OBJECTIVES:
General objectives

 To increase efficiency and improve services provided to the customers through


better application experience in using compilers.
 All in one toolkit to resolve ambiguities and performing lexical analysis etc.
 Solving grammar problems in an efficient way.
 To break down the data into computer-readable formats.
 To resolve ambiguities of the language pertaining to the input string of tokens.

Specific objectives

 To enable calculation of Parsing table and grammar checking at run time.


 To make it more simple and reliable.
 More Accuracy.
 Improve efficiency of Compilers.

PROJECT SCOPE:
The Lexical Analyzer techniques and removal of left recursion, left factoring etc
techniques are more powerful than any other compiler. Checking grammars and
resolving ambiguities is a tricky task. But this model is efficiently performing all these. It
is sufficiently powerful to be useful for practical grammars and is the easiest to
implement. It works on given SLR Parser grammar to do string validation, more unique
than any other model. However this method is not powerful enough to solve the
inadequate states in every grammar. Other algorithms extended from this one would be
able to give resolution, but the implementations are more elaborate
Features of Tool Kit:
The implemented tool has following features...

1. Added the feature that lexical analysis phage of compiler in which just need to provide
the path of input file(file can be C/C++, JAVA, PYTHON, etc...) and this tool will be
tokenized each in line of given input file.

2. Added the feature of removable of left recursion for the given input grammar.

3. Added the feature of removable of left factoring for the given input grammar.

4. Added the feature of string validation of Recursive Descent parser for the following
fixed grammar...

1) E--> E + T
2) E--> T
3) T--> T * F
4) T--> F
5) F--> (E)
6) F--> a

The string will be something like a+a$, a+a*a$, a+(a+a*a) and so on...

5. Added the feature of calculation of first for the given input grammar.

6. Added the feature of calculation of follow for the given input grammar.

7. Added the feature of calculation of firstAndFollow simultaneously for the given input
grammar.

8. Added the feature of string validation of SLR parser for the following fixed grammar...

1) E--> a E a
2) E--> b E b
3) E--> c

The string wiil be something like c, aca,aabbcbbaa, bacab and so on...

9. Added the feature of Canonical collection of LR(0) items and Tabel of SLR parser for
the given grammar.
LANGUAGES:

Java and Python

ENVIRONMENT:

JDK 11.0.4, JRE, Swing-Designer, Window Builder.

You might also like