Professional Documents
Culture Documents
TOKENS
1
Strings and Languages
• Regular Expressions are an important notation for specifying patterns.
2
Languages
• A language, L, is simply any set of strings over a
fixed alphabet.
Alphabet Languages
{0,1} {0,10,100,1000,100000…}
{0,1,00,11,000,111,…}
{a,b,c} {abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {FOR,WHILE,GOTO,…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…}
3
String operations
• Given String: banana
• Prefix : ban, banana
• Suffix : ana, banana
• Substring : nan, ban, ana, banana
• Subsequence: bnan, nn
• Proper Prefix and Suffix
4
String Operations
• Concatenation
– xy; s = s = s; - identity for concatenation
– s0 = if i > 0 si = si-1s
5
Operations on Languages
OPERATION DEFINITION
union of L and M L M = {s | s is in L or s is in M}
written L M
concatenation of L LM = {st | s is in L and t is in M}
and M written LM
Kleene closure of L L*= Li
written L*
i 0
written L+ L+=
i 1
7
Say What?
L = {A, B, C, D } D = {1, 2, 3}
• LD
{A, B, C, D, 1, 2, 3 }
• LD
{A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }
• L2
{ AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}
• L*
{ All possible strings of L plus }
• L+
L* -
• L (L D )
Valid :{ A1,AA,B3,CD} Invlaid:{321,4A2}
• L (L D )*
Valid:{ A,A1,A23,D3,DA3..} Invalid:{31}
8
Regular Expressions
• A Regular Expression is a Set of Rules /
Techniques for Constructing Sequences of
Symbols (Strings) from an Alphabet.
9
Regular Expressions
• Defined over an alphabet Σ
11
Algebraic Properties of Regular
Expressions
AXIOM DESCRIPTION
r|s=s|r | is commutative
r | (s | t) = (r | s) | t | is associative
(r s) t = r (s t) concatenation is associative
r(s|t)=rs|rt
(s|t)r=sr|tr concatenation distributes over |
r = r
r = r Is the identity element for concatenation
12
Regular Definitions
• Names maybe given to regular expressions; these
names can be used like symbols
• Let is an alphabet of basic symbols. The regular
definition is a sequence of definitions of the form
d1 r1
d2 r2
...
dn rn
Where, each di is a distinct name, and each ri is a
regular expression over the symbols in {d1, d2,
…, di-1 }
13
Regular Definitions
• Example 1:
– letter A|B|…|Z|a|b|…|z
– digit 0|1|…|9
– id letter (letter | digit)*
• Example 2
– digit 0 | 1 | 2 | … | 9
– digits digit digit*
– optional_fraction . digits |
– optional_exponent ( E ( + | -| ) digits) |
– num digits optional_fraction optional_exponent
14
Regular Definitions
• Shorthand
– One or more instances: r+ denotes rr*
– Zero or one Instance: r? denotes r|ε
– Character classes: [a-z] denotes [a|b|…|
z]
15
Example
• digit 0 | 1 | 2 | … | 9
• digits digit+
• optional_fraction (. digits ) ?
• optional_exponent ( E ( + | -) ? digits) ?
• num digits optional_fraction optional_exponent
16
Limitations of Regular
Expression
• Some languages cannot be described by any regular
expression
• Cannot describe balanced or nested constructs
– Example, all valid strings of balanced parentheses
– This can be done with CFG
• Cannot describe repeated strings
– Example: {wcw|w is a string of a’s and b’s}
– This can be done with CFG
• Can be used to denote only a fixed or unspecified
number of repetitions.
17