Lexical and Syntax Analysis: Topics

Topics
Introduction
Chapter 4
Lexical Analysis
Syntax Analysis
Lexical and Syntax Analysis Recursive -Descent Parsing
Bottom-Up parsing
Chapter 4: Lexical and Syntax Analysis 2
Language Implementation Compilation

There are three possible approaches to
translating human readable code to
machine code
1. Compilation
2. Interpretation
3. Hybrid
Chapter 4: Lexical and Syntax Analysis 3 Chapter 4: Lexical and Syntax Analysis 4
Issues in Lexical and Syntax

Introduction
Analysis
The syntax analysis portion of a Reasons for separating both analysis:
language processor nearly always 1) Simpler design.
§ Separation allows the simplification of one or the other.
consists of two parts: § Example: A parser with comments or white spaces is more
complex
n A low-level part called a lexical analyzer
2) Compiler efficiency is improved.
Based on a regular grammar. • Optimization of lexical analysis because a large amount of
time is spent reading the source program and partitioning it
Output: set of tokens. into tokens.
n A high-level part called a syntax analyzer 3) Compiler portability is enhanced.
§ Input alphabet peculiarities and other device-specific
Based on a context-free grammar or BNF
anomalies can be restricted to the lexical analyzer.
Output: parse tree.
1
Lexical Analyzer Examples of Tokens
First phase of a compiler. Example of an assignment
It is also called scanner. result = value / 100;
Main task: read the input characters and
produce as output a sequence of tokens.
Token Lexeme
Process: IDENT result
Input: program as a single string of characters. ASSIGNMENT_OP =
IDENT value
Collects characters into logical groupings and assigns
DIVISION_OP /
internal codes to the groupings according to their
INT_LIT 100
structure. SEMICOLON ;
Groupings: lexemes
Internal codes: tokens
Interaction between lexical and

Lexical Analysis
syntax analyzers
Secondary tasks:
token parse
n Stripping out from the source program
source lexical syntax tree comments and white spaces in the form of
program analyzer analyzer blank, tab, and new line characters.
get next token
n Correlating error messages from the
compiler with the source program.
symbol
n Inserting lexemes for user-defined names
table into the symbol table.
Building a Lexical Analyzer State Transition Diagram

Three different approaches: Directed graph
n Write a formal description of the tokens and Nodes are labeled with state names.
use a software tool that constructs table-driven Arcs are labeled with the input characters that
lexical analyzers given such a description (e,g, cause the transitions
lex) An arc may also include actions the lexical
n Design a state diagram that describes the analyzer must perform when the transition is
tokens and write a program that implements taken.
the state diagram A state diagrams represent a finite automaton
n Design a state diagram that describes the which recognizes regular languages
tokens and hand-construct a table- driven (expressions).
implementation of the state diagram
2
State Diagram Design State Diagram: Example
A naive state diagram would have a transition
from every state on every character in the
source language - such a diagram would be very
large.
In many cases, transitions can be combined to
simplify the state diagram
n When recognizing an identifier, all uppercase and
lowercase letters are equivalent
Use a character class that includes all letters
n When recognizing an integer literal, all digits are
equivalent - use a digit class
Syntax Analyzer Parser

The syntax analyzer or parser must determine Two categories of parsers
the structure of the sequence of tokens provided
to it by the scanner. n Top-down: produce the parse tree,
Check the input program to determine whether beginning at the root down to the
is syntactically correct. leaves.
Produce either a complete parse tree of at least trace n Bottom-up: produce the parse tree,
the structure of the complete parse tree. beginning at the leaves upward to the
Error: produce a diagnostic message and recover
(gets back to a normal state and continue the analysis
root.
of the input program: find as many errors as possible
in one pass).
Conventions Top-Down Parser

Terminal symbols: lowercase letters at the Build the parse tree in preorder.
beginning of the alphabet (a,b,…)
Nonterminal symbols: uppercase letters at the o Begins with the root.
beginning of the alphabet (A,B,…) o Each node is visited before its branches
Terminals or nonterminals: uppercase letters at are followed.
the end of the alphabet (W,X,Y,Z)
o Branches from a particular node are
Strings of terminals: lowercase letters at the end
of the alphabet (w,x,y,z) followed in left-to-right order.
Mixed strings (terminals and/or nonterminals ): Uses leftmost derivation.
lowercase Greek letters (α,β,δ,γ)
3
Next sentential form? Example
Given a sentential form xAα, A is the <S> ::= <NP> <VP> <Det> ::= that| this| a
leftmost nonterminal that could be <S> ::= <aux> <NP> <VP> <Noun> ::= book | flight | meal
expanded to get the next sentential form in <S> ::= <VP> <Verb> ::= book | prefer
a leftmost derivation. <NP> ::= <Proper_Noun> <Aux> ::= does
Current sentential form: xA α <NP> ::= <Det> <Nom> <Prep> ::= from | to | on
A-rules: A? bB, A? cBb, and A? a <Nom> ::= <Noun> <Proper_Noun> ::= Houston
<Nom> ::= <Noun> <Nom>
Next sentential form?:
xbBα or xcBbα or xaα <VP> ::= <Verb>
<VP> ::= <Verb> <NP>
This is known as the parsing decision
problem for top-down parsers.
A miniature English grammar
Example: “Book that flight” Example: “Book that flight”

Suppose the parser is able to build all possible These constituents of these 3 new trees are
partial trees at the same time. expanded in the same way we just expanded
The algorithm begins assuming that the input S; and so on.
can be derived by the designated start symbol S. At each ply we use the RHS of the rules to
The next step is to find the tops of all trees which provide new sets of expectations for the
can start with S, by looking for all the grammars parser, which are then used to recursively
rules with S on the LHS. generate the rest of the tree.
There are 3 rules that expand S, so the second ply, or Trees are grown downward until they
level, has 3 partial trees. eventually reach the terminals at the bottom
of the tree.
Example: “Book that flight” Top-Down Parser

At this point, trees whose leaves fail to match Parsers look only one token ahead in the
all the words in the input can be rejected, input
leaving behind those trees that represent n Given a sentential form, xAα , the parser must
successful parses. choose the correct A -rule to get the next
In this example, only the fifth parse tree (the sentential form in the leftmost derivation,
one which has expanded the rule <VP> ::= using only the first token produced by A
<Verb><NP>) will eventually match the input The most common top-down parsing
sentence. algorithms:
n Recursive descent - a coded implementation
n LL parsers - table driven implementation
4
Bottom-Up Parser Example: “Book that flight”
Bottom-up parsing is the earliest know The parse beginning by looking to each
parsing algorithm. word and building 3 partial trees with the
Build the parse tree beginning at the category of each terminal.
leaves and progressing towards the root. n The word book is ambiguous (it can be a
Order corresponds to the reverse of a noun or a verb). Thus the parser must
rightmost derivation. consider two possible sets of trees.
n The parse is successful if the parser n Each of the trees in the second ply is then
succeeds in building a tree rooted in the start expanded, and so on.
symbol that covers all of the input.
Example: “Book that flight” Top-Down vs. Bottom-Up

n In general, the parser extends one ply to the Advantage (top-down)
next by looking for places in the parse- in- n Never waste time exploring trees that cannot
progress where the right- hand-side of some result in the root symbol (S), since it begins by
rule might fit. generating just those trees.
n In the fifth ply, the interpretation of book as a n Never explores subtrees that cannot find a
noun has been pruned because this parse place in some S-rooted tree
cannot be continued: there is no rule in the
grammar with RHS <Nominal><NP> o Bottom-up: left branch is completely wasted
effort because it is based on interpreting book
n The final ply is the correct parse tree. as Noun at the beginning of the sentence
The most common bottom-up parsing despite the fact no such tree can lead to an S
algorithms are in the LR family given this grammar.
Top-Down vs. Bottom-Up

Disadvantage (top-down)
n Spend considerable effort on S trees that are
not consistent with the input.
n Firsts four of six trees in the third ply have left
branches that cannot match the word book.
n This weakness arises from the fact that they
can generate trees before ever examining the
input.
Chapter 4: Lexical and Syntax Analysis 29

Lexical and Syntax Analysis: Topics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lexical and Syntax Analysis: Topics

Uploaded by

Copyright:

Available Formats

Topics

Chapter 4: Lexical and Syntax Analysis 2

Language Implementation Compilation

Issues in Lexical and Syntax

Interaction between lexical and

Building a Lexical Analyzer State Transition Diagram

Syntax Analyzer Parser

Conventions Top-Down Parser

Example: “Book that flight” Example: “Book that flight”

Example: “Book that flight” Top-Down Parser

Example: “Book that flight” Top-Down vs. Bottom-Up

Top-Down vs. Bottom-Up

Chapter 4: Lexical and Syntax Analysis 29

You might also like