Professional Documents
Culture Documents
CS143
Lecture 3
2
Lexical Analysis
3
What’s a Token?
• A syntactic category
– In English:
noun, verb, adjective, …
– In a programming language:
Identifier, Integer, Keyword, Whitespace, …
4
Tokens
5
What are Tokens For?
6
Designing a Lexical Analyzer: Step 1
7
Example
• Recall
\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
8
Designing a Lexical Analyzer: Step 2
• Recall:
– Identifier: strings of letters or digits, starting with a letter
– Integer: a non-empty string of digits
– Keyword: “else” or “if” or “begin” or …
– Whitespace: a non-empty sequence of blanks,
newlines, and tabs
9
Lexical Analyzer: Implementation
10
Example
• Recall:
\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
11
Lexical Analyzer: Implementation
12
True Crimes of Lexical Analysis
• Is it as easy as it sounds?
13
Lexical Analysis in FORTRAN
• A terrible design!
14
FORTRAN Example
• Consider
– DO 5 I = 1,25
– DO 5 I = 1.25
15
Lexical Analysis in FORTRAN (Cont.)
16
Lookahead
17
Lexical Analysis in PL/I
18
Lexical Analysis in PL/I (Cont.)
• PL/I Declarations:
DECLARE (ARG1,. . ., ARGN)
19
Lexical Analysis in C++
21
Next
• We still need
– A way to describe the lexemes of each token
22
Regular Languages
23
Languages
24
Examples of Languages
25
Notation
26
Atomic Regular Expressions
• Single character
27
Compound Regular Expressions
• Union
A+ B = {s | s ∈ A or s ∈ B}
• Concatenation
AB = {ab | a ∈ A and b ∈ B}
• Iteration
A* = ∪i≥0 A i where A i = AA . . . A
i times
28
Regular Expressions
AB = {ab | a ∈ A and b ∈ B}
30
Syntax vs. Semantics
L('a' + 'b')
Semantics (content) L('a'*)
b aa aaa
a ϵ a ...
Syntax (label)
31
Syntax vs. Semantics
L(ε ) = {""}
L(' c ') = {" c "}
L( A + B ) = L( A)∪ L( B)
L( AB) = {ab | a ∈ L( A) and b ∈ L( B)}
ii
L(A*)*
L( A ) =
= ∪i≥0
! i ≥0
L(A
L( A ))
32
Segue
33
Example: Keyword
34
Example: Integers
digit = '0 '+ '1'+ ' 2 '+ '3'+ ' 4 '+ '5'+ '6 '+ '7 '+ '8'+ '9 '
integer = digit digit *
+
Abbreviation: A = AA*
Abbreviation: [0-2] = '0' + '1' + '2'
35
Example: Identifier
36
Example: Whitespace
+
(' ' + '\n' + '\t')
37
Example: Phone Numbers
∑ = digits ∪ {-,(,)}
3
exchange = digit
phone = digit 4
area = digit 3
phone_number = '(' area ')-' exchange '-' phone
38
Example: Email Addresses
• Consider anyone@cs.stanford.edu
∑ = letters ∪ {.,@}
name = letter +
address = name '@' name '.' name '.' name
39
Example: Unsigned Pascal Numbers
40
Other Examples
• File names
• Grep tool family
41
Summary