You are on page 1of 12

PARSING AND PARSING TECHNIQUES

Presented By:
(Group F)

OKOLO FRANKLIN HCSF/21/0046

OSHINBOLU SONIA OLUWATOYIN HCSF/21/0016

ADESINA ADAM ADKUNLE HCSF/21/0026

EDIGA OLACHENE GODDAY HCSF/21/0010

OLAJIDE OLUWATOBI JOSEPH HCSF/21/0056

ABIONA JOSHUA OLUMIDE HCSF/21/0036

ALEXANDER VICTORIA OLORUNWA HCSF/21/0006

Supervised By:
MRS. OJO ADJOSE

30th January 2023

1
ABSTRACT

The process of transforming the data from one format to another is called Parsing. This
process can be accomplished by the parser. The parser is a component of the translator that
helps to organise linear text structure following the set of defined rules which is known as
grammar.

2
INTRODUCTION

Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural
language or in computer languages, according to the rules of a formal grammar. The term
parsing comes from Latin pars (ōrātiōnis), meaning part (of speech). The term has slightly
different meanings in different branches of linguistics and computer science. Traditional
sentence parsing is often performed as a method of understanding the exact meaning of a
sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes
the importance of grammatical divisions such as subject and predicate. Within computational
linguistics the term is used to refer to the formal analysis by computer of a sentence or other
string of words into its constituents, resulting in a parse tree showing their syntactic relation
to each other, which may also contain semantic and other information. The term is also used
in psycholinguistics when describing language comprehension. In this context, parsing refers
to the way that human beings analyze a sentence or phrase (in spoken language or text) "in
terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc."
This term is especially common when discussing what linguistic cues help speakers to
interpret garden-path sentences. Within computer science, the term is used in the analysis of
computer languages, referring to the syntactic analysis of the input code into its component
parts in order to facilitate the writing of compilers and interpreters

HISTORY

Parsing is the process of structuring a linear representation in accordance with a given


grammar. This definition has been kept abstract on purpose, to allow as wide an interpretation
as possible. The “linear representation” may be a sentence, a computer program, a knitting
pattern, a sequence of geological strata, a piece of music, actions in ritual behavior, in short
any linear sequence in which the preceding elements in some way restrict† the next element.
For some of the examples the grammar is well-known, for some it is an object of research and
for some our notion of a grammar is only just beginning to take shape. For each grammar,
there are generally an infinite number of linear representations (“sentences”) that can be
structured with it. That is, a finite-size grammar can supply structure to an infinite number of
sentences. This is the main strength of the grammar paradigm and indeed the main source of

3
the importance of grammars: they summarize succinctly the structure of an infinite number of
objects of a certain class. There are several reasons to perform this structuring process called
parsing. One reason derives from the fact that the obtained structure helps us to process the
object further. When we know that a certain segment of a sentence in German is the subject,
that information helps in translating the sentence. Once the structure of a document has been
brought to the surface, it can be converted more easily. A second is related to the fact that the
grammar in a sense represents our understanding of the observed sentences: the better a
grammar we can give for the movements of bees, the deeper our understanding of them is. A
third lies in the completion of missing information that parsers, and especially error-repairing
parsers, can provide. Given a reasonable grammar of the language, an error-repairing parser
can suggest possible word classes for missing or unknown words on clay tablets.

How parsing work

A parser is a program that is part of the compiler, and parsing is part of the compiling
process. Parsing happens during the analysis stage of compilation.

In parsing, code is taken from the pre-processor, broken into smaller pieces and analyzed so
other software can understand it. The parser does this by building a data structure out of the
pieces of input.

More specifically, a person writes code in a human-readable language like C++ or Java and
saves it as a series of text files. The parser takes those text files as input and breaks them
down so they can be translated on the target platform.

The parser consists of three components, each of which handles a different stage of the
parsing process. The three stages are:

Stage 1: Lexical analysis

A lexical analyzer -- or scanner -- takes code from the preprocessor and breaks it into smaller
pieces. It groups the input code into sequences of characters called lexemes, each of which
corresponds to a token. Tokens are units of grammar in the programming language that the
compiler understands.

Lexical analyzers also remove white space characters, comments and errors from the input.

4
TOKEN CLASS
X Identifier
+ Addition operator
Z Identifier
= assignment operator
11 Number

Fig 1: Parse Token Classfication


https://www.techtarget.com/searchapparchitecture/definition/parser

Given the set of characters x+z=11, the lexical analyzer would separate it into a series of
tokens and classify them as shown.

Stage 2: Syntactic analysis

This stage of parsing checks the syntactical structure of the input, using a data structure called
a parse tree or derivation tree. A syntax analyzer uses tokens to construct a parse tree that
combines the predefined grammar of the programming language with the tokens of the input
string. The syntactic analyzer reports a syntax error if the syntax is incorrect.

Fig 2: Parse Tree


https://www.techtarget.com/searchapparchitecture/definition/parser

The syntactic analyzer takes (x+y)*3 as input and returns this parse tree, which enables the
parser to understand the equation.

5
Stage 3: Semantic analysis

Semantic analysis verifies the parse tree against a symbol table and determines whether it is
semantically consistent. This process is also known as context sensitive analysis. It
includes data type checking, label checking and flow control checking.

If the code provided is this:

float a = 30.2; float b = a*20

then the analyzer will treat 20 as 20.0 before performing the operation.

Some sources refer only to the syntactic analysis stage as parsing because it generates the
parse tree. They leave out lexical and semantic analysis.

The technical process of parsing has three stages:


1. Lexical Analysis: produces tokens from a stream of input characters. A token is the
smallest unit in a programming language that possesses some meaning (such as +, -,
*, “function”, or “new” in JavaScript).

2. Syntactic Analysis: checks to see if the generated tokens form a meaningful


expression.

3. Semantic Parsing: uses syntax trees and symbol look-up tables to determine whether


the generated tokens are semantically consistent with a specific programming
language.

When a software language is created, its creators must specify a set of rules. These rules
provide the grammar needed to construct valid statements in the language.

The following is a set of grammatical rules for a simple fictional language that only contains
a few words:

<sentence> ::= <subject> <verb> <object>


<subject> ::= <article> <noun>
<article> ::= the | a

6
<noun> ::= dog | cat | person
<verb> ::= pets | fed
<object> ::= <article> <noun>

In this language, a sentence must contain a subject, verb and noun in that order, and specific
words are matched to the parts of speech. A subject is an article followed by a noun. A noun
can be one of the following three words: dog, cat or person. And a verb can only
be pets or fed.

Parsing checks a statement that a user provides as input against these rules to prove that the
statement is valid. Different parsing algorithms check in different orders. There are two main
types of parsers:

Parsing

Top-Down Bottom-Up
Parsing Parsing
Fig 3: Types of Parser
https://www.javatpoint.com/parser

 Top-down parsers. 

These start with a rule at the top, such as

<sentence> ::= <subject> <verb> <object>.

Given the input string "The person fed a cat," the parser would look at the first rule, and
work its way down all the rules checking to make sure they are correct. In this case, the
first word is a <subject>, it follows the subject rule, and the parser will continue reading
the sentence looking for a <verb>.

 Bottom-up parsers. 

These start with the rule at the bottom. In this case, the parser would look for an <object>
first, then look for a <verb> next and so on.

7
More simply put, top-down parsers begin their work at the start symbol of the grammar at the
top of the parse tree. They then work their way down from the rule to the sentence. Bottom-
up parsers work their way up from the sentence to the rule.

Beyond these types, it's important to know the two types of derivation. Derivation is the order
in which the grammar reconciles the input string. They are:

 LL parsers. These parse input from left to right using leftmost derivation to match the
rules in the grammar to the input. This process derives a string that validates the input by
expanding the leftmost element of the parse tree.

 LR parsers. These parse input from left to right using rightmost derivation. This process
derives a string by expanding the rightmost element of the parse tree.

In addition, there are other types of parsers, including the following:

 Recursive descent parsers. Recursive descent parsers backtrack after each decision


point to double-check accuracy. Recursive descent parsers use top-down parsing.

 Earley parsers. These parse all context-free grammars, unlike LL and LR parsers. Most


real-world programming languages do not use context-free grammars.

 Shift-reduce parsers. These shift and reduce an input string. At each stage in the string,
they reduce the word to a grammar rule. This approach reduces the string until it has been
completely checked.

LR PASER AND TYPES OF LR PARSER

LR parsing is one type of bottom up parsing. It is used to parse the large class of grammars.

In the LR parsing, "L" stands for left-to-right scanning of the input.

"R" stands for constructing a right most derivation in reverse.

"K" is the number of input symbols of the look ahead used to make number of parsing
decision.

LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing and LALR
parsing.

8
LR Parser

LALR( CLR(1
LR (0) SLR(1)
1) )

Fig 4: Types of LR Parser


https://www.javatpoint.com/lr-parser

1. SLR(1) Parsing
It is similar to LR(0) parser but works on a reduced entry.
It has few number of states hence very small table.
It is has simple and fast construction.

2. LALR(1) Parsing
It attempts to reduce the number of states in LR parser by merging similar states.
This reduces the number of states to the same as SLR but retains some power of LR
look-aheads.

It is the Look ahead LR parser.


Works on an intermediate size of grammar.
It can has the same number of states as the SLR(1).

3. CLR(1) Parsing
Refers to canonical look ahead.
It uses the canonical collection of LR(1) to construct its parsing table.
It has more states compared to SLR parsing.
In this type of parsing the reduced node is replaced only in the lookahead symbols.

9
Technologies That Uses Parsing
Parsers are used when there is a need to represent input data from source code abstractly as
a data structure so that it can be checked for the correct syntax. Coding languages and other
technologies use parsing of some type for this purpose.

Technologies that use parsing to check code inputs include the following:

Programming languages. Parsers are used in all high-level programming languages,


including the following:

 C++
 Extensible Markup Language or XML
 Hypertext Markup Language or HTML
 Hypertext Preprocessor or PHP
 Java
 Python

Database languages. Database languages such as Structured Query Language also use


parsers.

Protocols. Protocols like the Hypertext Transfer Protocol and internet remote function calls


use parsers.

Parser generator. Parser generators take grammar as input and generate source code, which
is parsing in reverse. They construct parsers from regular expressions, which are special
strings used to manage and match patterns in text.

10
CONCLUSION

11
REFFRENCE
CSE, Dronachraya Collage of Engineering, Gurgaon etal (2013) Parsing Technices,
International Journal of Advanced Research in Engineering and Applied Sciences
https://www.javatpoint.com/lr-parser
https://byjus.com/gate/parsing-in-compiler-design
Benlutkevich paser retrieved from:
https://www.techtarget.com/searchapparchitecture/definition/parser

12

You might also like