Professional Documents
Culture Documents
Compiler Construction
[SWE-310]
Lexical Analysis
1
8/5/2020
2
8/5/2020
3
8/5/2020
Token:
A token is a pair consisting of a token name and an
optional attribute value.
The token name is an abstract symbol representing a
kind of lexical unit, e.g., a particular keyword, or a
sequence of input characters denoting an identifier.
The token names are the input symbols that the
parser processes.
In what follows, we shall generally write the name of
a token in boldface. We will often refer to a token by
its token name.
4
8/5/2020
Pattern:
A pattern is a description of the form that the lexemes of
a token may take.
In the case of a keyword as a token, the pattern is just
the sequence of characters that form the keyword.
For identifiers and some other tokens, the pattern is a
more complex structure that is matched by many strings.
Lexeme:
A lexeme is a sequence of characters in the source
program that matches the pattern for a token and is
identified by the lexical analyzer as an instance of that
token.
5
8/5/2020
Buffer Pairs:
A large amount of time is taken to process characters
and large number of characters must be processed
during the compilation of a large source program.
Specialized buffering techniques have been developed to
reduce the amount of overhead required to process a
single input character.
An important scheme involves two buffers that are
alternately reloaded.
6
8/5/2020
7
8/5/2020
8
8/5/2020
9
8/5/2020
10
8/5/2020
11
8/5/2020
Regular Expressions:
Regular expressions is a method for
describing/expressing patterns of a language.
The regular expressions are built recursively out
of smaller regular expressions.
Each regular expression r denotes a language
L(r), which is also denoted recursively from the
languages denoted by r's sub expressions.
12
8/5/2020
13
8/5/2020
Regular Definitions:
For notational convenience, names are
given to certain regular expressions and
those names are used in subsequent
expressions, as if the names were
themselves symbols.
14
8/5/2020
15
8/5/2020
Recognition of Tokens:
After learning how to express patterns
using regular expressions the next task is
to take the patterns for all the needed
tokens and build a piece of code that
examines the input string and find a prefix
that is a lexeme matching one of the
patterns.
16
8/5/2020
17
8/5/2020
18
8/5/2020
19
8/5/2020
20
8/5/2020
21
8/5/2020
22
8/5/2020
23