Professional Documents
Culture Documents
CH 6
CH 6
Lecture 6
2
The phases of a compiler
◼ A compiler can be divided into two parts:
analysis and synthesis.
Syntax analysis (parsing)
◼ Syntax analysis or parsing is the second phase of a compiler.
◼ We have seen that a lexical analyzer can identify tokens with the help of
regular expressions and pattern rules.
◼ But a lexical analyzer cannot check the syntax of a given sentence due to the
limitations of the regular expressions.
◼ Regular expressions cannot check balancing tokens, such as parenthesis.
◼ Therefore, this phase uses context-free grammar (CFG), which is recognized
by push-down automata.
5
Syntax Analyzer
◼ A syntax analyzer or parser takes the input from a lexical analyzer in the
form of token streams.
◼ The parser analyzes the token stream against the production rules to
detect any errors in the code.
◼ The output of this phase is a parse tree.
6
Syntax Analyzer
7
Context-Free Grammar
◼ A context-free grammar has four components:
1. A set of non-terminals (V):
Non-terminals are syntactic variables that denote sets of strings.
The non-terminals define sets of strings that help define the language generated by the
grammar.
2. A set of tokens, known as terminal symbols (Σ):
Terminals are the basic symbols from which strings are formed.
3. A set of productions (P):
The productions of a grammar specify the manner in which the terminals and non-
terminals can be combined to form strings.
Each production consists of a non-terminal called the left side of the production, an
arrow, and a sequence of tokens and/or non- terminals, called the right side of the
production.
4. One of the non-terminals is designated as the start symbol (S); from where the production
begins. 8
9
Example
◼ Some examples use expressions ◼ The non-terminals (V) are: list, digit
consisting of digits and plus and ◼ list is the start symbol (S) because
minus signs; its production are given first.
◼ e.g., strings such as 9-5+2, 3-1, or ◼ 0 1 2 3 4 5 6 7 8 9 and the
7. operations + - are the terminal
◼ The following grammar describes symbols (Σ)
the syntax of these expressions.
The productions (P) are:
10
Derivation
◼ The strings are derived from the start symbol by repeatedly replacing a non-
terminal (initially the start symbol) by the right side of a production, for that
non-terminal.
◼ During parsing, we take two decisions:
1. Deciding the non-terminal which is to be replaced.
2. Deciding the production rule, by which, the non-terminal will be replaced.
◼ To decide which non-terminal to be replaced with production rule, we can have
two options. (Types of Derivation)
1. Left-most Derivation
2. Right-most Derivation
11
Types of Derivation
➢ Left-most Derivation
◼ If the sentential form of an input is scanned and replaced from left to right, it is called
left-most derivation.
◼ The sentential form derived by the left-most derivation is called the left-sentential
form.
➢ Right-most Derivation
◼ If we scan and replace the input with production rules, from right to left, it is known as
right-most derivation.
◼ The sentential form derived from the right-most derivation is called the right-
sentential form.
12
Example
13
Parse Tree
14
Parse Tree: Example
Step 1
15
Parse Tree: Example (Cont.)
Step 2
16
Parse Tree: Example (Cont.)
Step 3
17
Parse Tree: Example (Cont.)
Step 4
18