Professional Documents
Culture Documents
LECTURE MATERIAL
Alphabet
A finite, non-empty set of symbols. We normally use the symbols a, b, c, . . .
with or without subscripts or 0, 1, 2, . . ., etc. for the elements of an alphabet.
A set of alphabets is represented by the symbol
Examples:
The binary alphabet: = {0, 1}
The set of all lower-case letters: = {a, b, . . . , z}
Example: Σ = {a, b, c, d} is an alphabet set where ‘a’, ‘b’, ‘c’, and ‘d’ are
symbols.
2
Strings
A string is a finite sequence, possibly empty, of symbols drawn from some
alphabet . Given any alphabet , the shortest string that can be formed from
is the empty string, which we will write as . The set of all possible strings
over an alphabet is written as *. A string over an alphabet Σ is a finite
sequence of symbols of Σ. Although one writes a sequence as (a1, a2, . . . , an),
in the present context, we prefer to write it as a1a2 · · · an, i.e. by juxtaposing
the symbols in that order.
Operations on Strings
Empty String: An empty string is a string with zero occurrences of symbols.
This string is denoted by ϵ and may be chosen from any alphabet whatsoever.
3
Length of a String: This is the number of positions for symbols in the string
Example: 01101 has length 5
Note that there are only two symbols (0 and 1) in the string 01101, but 5
positions for symbols. Length of string w is denoted by |w|.
Example: |011| = 3 and |ϵ| = 0
Concatenation
One of the most fundamental operations used for string manipulation is
concatenation. Let x = a1a2 · · · an and y = b1b2 · · · bm be two strings. The
concatenation of the pair x, y denoted by xy is the string a1a2 · · · anb1b2 · · ·
bm. Clearly, the binary operation concatenation on Σ∗ is associative, i.e., for all
x, y, z ∈ Σ∗, x(yz) = (xy)z. Thus, x(yz) may simply be written as xyz. Also,
since ε is the empty string, it satisfies the property εx = xε = x for any string x ∈
Σ∗. Hence, Σ∗ is a monoid with respect to concatenation.
|x| =
Essentially, the length of a string is obtained by counting the number of
symbols in the string. For example, |aab| = 3, |a| = 1. Note that |ε| = 0.
If we denote An to be the set of all strings of length n over Σ, then one
can easily ascertain that
4
And hence, being An a finite set, Σ∗ is a countably infinite set.
We say that x is a substring of y if x occurs in y, that is y = uxv for
some strings u and v. The substring x is said to be a prefix of y if u = ε.
Similarly, x is a suffix of y if v = ε.
Generalizing the notation used for number of occurrences of symbol a in a
string x, we adopt the notation |y|x as the number of occurrences of a string
x in y.
Powers of an Alphabet
If is an alphabet, we can express the set of all strings of a certain length from
that alphabet by using the exponential notation: the set of strings of length k,
k
each of whose is in
Examples:
0
: { ϵ }, regardless of what alphabet k
is. That is ϵ is the only string of
length 0
k
If = {0, 1}, then:
1
1. = {0, 1}
2
2. = {00, 01, 10, 11}
3
3. = {000, 001, 010, 011, 100, 101, 110, 111}
k 1
Note: confusion between and :
k
1. is an alphabet; its members 0 and 1 are symbols
1
2. is a set of strings; its members are strings (each one of length 1)
Kleen Star
The Kleene star, Σ*, is a unary operator on a set of symbols or strings, Σ, that
gives the infinite set of all possible strings of all possible lengths over Σ
5
including λ.
i.e. Σ* = Σ0 U Σ1 U Σ2 U……. where Σp is the set of all possible strings
of length p.
Example: If Σ = {a, b}, Σ*= {λ, a, b, aa, ab, ba, bb,………..}
*
: The set of all strings over an alphabet {0, 1}* = { ϵ, 0, 1, 00, 01, 10, 11,
000, . . .}
* 0 1 2
= ∪ ∪ ∪...
The symbol ∗ is called Kleene star and is named after the mathematician and
logician Stephen Cole Kleene.
6
Functions on Strings
The length of a string s, which we will write as |s|, is the number of symbols in
s. For example:
|| = 0
|10011001| = 8
For any symbol c and string s, we define the function #c(s) to be the number of
times that the symbol c occurs in s. E.g. #a(abbaaa) = 4.
The concatenation of two strings s and t, written s || t or simply st, is the string
formed by appending t to s. For example, if x = good and y = bye, then
xy = goodbye. So |xy| = |x| + |y|.
The empty string, , is the identity for concatenation of strings.
So x (x = x = x).
Concatenation, as a function defined on strings, is associative.
So s, t, w ((st)w = s(tw)).
String Replication
For each string w and each natural number i, the string wi is defined as:
w0 =
wi+1 = wi w
For example:
a3 = aaa
(bye)2 = byebye
a0b3 = bbb
String Reversal
For each string w, the reverse of w, which we will write wR, is defined as:
if |w| = 0 then wR = w =
7
if |w| 1 then a (u * (w = ua)). (i.e., the last character of w is a.)
Then wR = a u R.
Relations on Strings
A string s is a substring of a string t iff s occurs contiguously as part of t.
For example:
aaa is a substring of aaabbbaaa
aaaaaa is not a substring of aaabbbaaa
8
A string s is a proper substring of a string t iff s is a substring of t and s t.
Every string is a substring (although not a proper substring) of itself. The empty
string, , is a substring of every string.
Languages [Definitions]
(1) A language is a (finite or infinite) set of strings over a finite alphabet .
(2) A language is a subset of Σ* for some alphabet Σ. It can be finite or
infinite.
(3) In order to define the notion of a language in a broad spectrum, it is felt that
it can be any collection of strings over an alphabet. Thus we define a language
over an alphabet Σ as a subset of Σ∗.
*
(4) If is an alphabet, and L ⊆ , then L is a (formal) language over .
When we are talking about more than one language, we will use the notation L
to mean the alphabet from which the strings in the language L are formed.
“Given some string s and some language L, is s in L?”
9
Examples:
(1) Let = {a, b}. * = {, a, b, aa, ab, ba, bb, aaa, aab, …}.
Some examples of languages over are:
, {}, {a, b}, {, a, aa, aaa, aaaa, aaaaa}, {, a, aa, aaa, aaaa, aaaaa, …}
(2) If the language takes all possible strings of length 2 over Σ = {a, b},
then L = { ab, bb, ba, bb}
(3) Let L = {} = . L is the language that contains no strings.
(4) The Empty Language is Different From the Empty String. Let L = {}, the
language that contains a single string, . Note that L is different from .
(5) The empty set ∅ is a language over any alphabet. Similarly, {ε} is also
a language over any alphabet.
(6) The set of all strings over {0, 1} that start with 0.
(7) The set of all strings over {a, b, c} having ac as a substring. Note that
∅ 6= {ε}, because the language ∅ does not contain any string but {ε} contains a
string, namely ε. Also it is evident that |∅| = 0; whereas, |{ε}| = 1.
Since languages are sets, we can apply various well known set operations
NOTE: ∅ 6= { ϵ } since ∅ has no strings and { ϵ } has one
10
(11) {w | w consists of an equal number of 0 and 1}
(12) {0n1n | n ≥ 1}
(13) {0i1j | 0 ≤ i ≤ j}
11
Intersection: Suppose L1 and L2 are languages over some common alphabet,
the intersection L1 ∩ L2 of L1 and L2 consists of all strings which are
contained in both languages
The empty set Ø and the set { } are languages over every alphabet. Ø is a
language that contains no string. { } is a language that contains just the empty
string.
The union of two languages L1 and L2, denoted L1 L2, refers to the language
that consists of all the strings that are either in L1 or in L2, that is, to { x | x is in
L1 or x is in L2 }.
The intersection of L1 and L2, denoted L1 L2, refers to the language that
consists of all the strings that are both in L1 and L2, that is, to { x | x is in L1
and in L2 }.
The complementation of a language L over , or just the complementation of L
when is understood, denoted , refers to the language that consists of all the
12
strings over that are not in L, that is, to { x | x is in * but not in L }.
The difference of L1 and L2, denoted L1 - L2, refers to the language that
consists of all the strings that are in L1 but not in L2, that is, to { x | x is in L1
but not in L2 }.
The cross product of L1 and L2, denoted L1 × L2, refers to the set of all the
pairs (x, y) of strings such that x is in L1 and y is in L2, that is, to the relation
{ (x, y) | x is in L1 and y is in L2 }.
Example: If L1 = { , 1, 01, 11} and L2 = {1, 01, 101} then L1 - L2 = { , 11} and
L2 - L1 = {101}.
On the other hand, if L1 = { , 0, 1} and L2 = {01, 11}, then the cross product of
these languages is L1 × L2 = {( , 01), ( , 11), (0, 01), (0, 11), (1, 01), (1, 11)},
and their composition is L1L2 = {01, 11, 001, 011, 101, 111}.
L - Ø = L, Ø - L = Ø, ØL = Ø, and { }L = L for each language L.
Li will also be used to denote the composing of i copies of a language L, where
L0 is defined as {}. The set L0 L1 L2 L3 . . . , called the Kleene closure or just
the closure of L, will be denoted by L*. The set L1 L2 L3 , called the positive
closure of L, will be denoted by L+.
Li consists of those strings that can be obtained by concatenating i strings from
L. L* consists of those strings that can be obtained by concatenating an arbitrary
13
number of strings from L.
Example: Consider the pair of languages L1 = { , 0, 1} and L2 = {01, 11}. For
these languages L1 2 = { , 0, 1, 00, 01, 10, 11}, and L2 3 = {010101, 010111,
011101, 011111, 110101, 110111, 111101, 111111}. In addition, is in L1*, in
L1 +, and in L2* but not in L2 +.
The operations above apply in a similar way to relations in * × *, when and are
alphabets.
Specifically, the union of the relations R1 and R2, denoted R1 R2, is the
relation { (x, y) | (x, y) is in R1 or in R2 }. The intersection of R1 and R2,
denoted R1 R2, is the relation { (x, y) | (x, y) is in R1 and in R2 }. The
composition of R1 with R2, denoted R1R2, is the relation { (x1x2, y1y2) | (x1,
y1) is in R1 and (x2, y2) is in R2 }.
Prefix Relation
We define the following languages in terms of the prefix relation on strings:
L1 = {w {a, b}*: no prefix of w contains b}
= {, a, aa, aaa, aaaa, aaaaa, aaaaaa, …}.
L2 = {w {a, b}*: no prefix of w starts with b}
= {w {a, b}*: the first character of w is a} {}.
L3 = {w {a, b}*: every prefix of w starts with b} = .
L3 is equal to because is a prefix of every string. Since does not start with
b, no strings meet L3’s requirement.
Recall that we defined the replication operator on strings: For any string s and
integer n, sn = n copies of s concatenated together. For example,
(bye)2 = byebye. We can use replication as a way to define a language, rather
than a single string, if we allow n to be a variable, rather than a specific
14
constant.
Lexicographic Enumeration
Let L = {x {a,b}* : all a's precede all b's}. The lexicographic enumeration of
L is:
, a, b, aa, ab, bb, aaa, aab, abb, bbb, aaaa, aaab, aabb, abbb, bbbb, aaaaa, …
15
L* always contains an infinite number of strings as long as L is not equal to
either or {} (i.e., as long as there is at least one nonempty string any number
of which can be concatenated together). If L = , then L* = {}, since there
are no strings that could be concatenated to to make it longer. If L = {}, then
L* is also {}. It is sometimes useful to require that at least one element of L be
selected. So we define: L+ = L L*.
Another way to describe L+ is that it is the closure of L under concatenation.
Note that L+ = L* - {} iff L
Examples
1. If L1 = {0, 1, 01} and L2 = {1, 00}, then
L1L2 = {01, 11, 011, 000, 100, 0100}.
2. For L1 = {b, ba, bab} and L2 = {ε, b, bb, abb}, we have
16
L1L2 = {b, ba, bb, bab, bbb, babb, baabb, babbb, bababb}.
Note:
1. Since concatenation of strings is associative, so is the concatenation of
languages. That is, for all languages L1, L2 and L3, (L1L2)L3 = L1(L2L3).
Hence, (L1L2)L3 may simply be written as L1L2L3.
2. The number of strings in L1L2 is always less than or equal to the
product of individual numbers, i.e. |L1L2| ≤ |L1||L2|.
3. L1 ⊆ L1L2 if and only if ε ∈ L2.
Proof. The “if part” is straightforward; for instance, if ε ∈ L2, then for any x ∈
L1, we have x = xε ∈ L1L2. On the other hand, suppose ε / ∈ L2. Now, note
that a string x ∈ L1 of shortest length in L1 cannot be in L1L2. This is because,
if x = yz for some y ∈ L1 and a nonempty string z ∈ L2, then |y| < |x|. A
contradiction to our assumption that x is of shortest length in L1.
Hence L1 6⊆ L1L2.
4. Similarly, ε ∈ L1 if and only if L2 ⊆ L1L2.
We write Ln to denote the language which is obtained by concatenating
n copies of L. More formally,
L0 = {ε} and
Ln = Ln−1L, for n ≥ 1.
Properties of Languages
The properties of languages with respect to the newly introduced operations:
concatenation, Kleene closure, and positive closure are as follows, L, L1, L2,
L3 and L4 are languages.
1. Recall that concatenation of languages is associative.
2. Since concatenation of strings is not commutative, we have L1L2 6= L2L1,
in general.
3. L{ε} = {ε}L = L.
4. L∅ = ∅L = ∅.
Proof. Let x ∈ L∅; then x = x1x2 for some x1 ∈ L and x2 ∈ ∅. But ∅ being
empty set cannot hold any element. Hence there cannot be any element x ∈ L∅
so that L∅ = ∅. Similarly, ∅L = ∅ as well.
5. Distributive Properties:
1. (L1 ∪ L2)L3 = L1L3 ∪ L2L3.
Proof. Suppose x ∈ (L1 ∪ L2)L3
⇒ x = x1x2, for some x1 ∈ L1 ∪ L2), and some x2 ∈ L3
18
⇒ x = x1x2, for some x1 ∈ L1 or x1 ∈ L2, and x2 ∈ L3
⇒ x = x1x2, for some x1 ∈ L1 and x2 ∈ L3,
or x1 ∈ L2 and x2 ∈ L3
⇒ x ∈ L1L3 or x ∈ L2L3
⇒ x ∈ L1L3 ∪ L2L3.
19
10. L∗L = LL∗ = L+.
20
Relation between languages, grammars and automata
TURING MACHINES
A Turing machine (TM) is a device that manipulates symbols on a strip of tape
according to a table of rules. Despite its simplicity, a Turing machine can be
adapted to simulate the logic of any computer algorithm, and is particularly
useful in explaining the functions of a CPU inside a computer. It is a
mathematical model which consists of an infinite length tape divided into cells
on which input is given. The "Turing" machine was described in 1936 by Alan
Turing who called it an "a-machine"(automatic machine). The Turing machine
is not intended as practical computing technology, but rather as a hypothetical
device representing a computing machine. Turing machines help computer
scientists understand the limits of mechanical computation.
A TM accepts a language if it enters into a final state for any input string w. A
language is recursively enumerable (generated by Type-0 grammar) if it is
accepted by a Turing machine. A TM decides a language if it accepts it and
enters into a rejecting state for any input not in the language. A language is
recursive if it is decided by a Turing machine. There may be some cases where
a TM does not stop. Such TM accepts the language, but it does not decide it.
22
Time and Space Complexity of a Turing Machine
For a Turing machine, the time complexity refers to the measure of the number
of times the tape moves when the machine is initialized for some input symbols
and the space complexity is the number of cells of the tape written.
Time complexity all reasonable functions:
T(n) = O(n log n)
TM's space complexity:
S(n) = O(n)
Example 1
Design a TM to recognize all strings consisting of an odd number of α’s.
Solution
The Turing machine M can be constructed by the following moves:
Let q1 be the initial state.
If M is in q1; on scanning α, it enters the state q2 and writes B (blank).
If M is in q2; on scanning α, it enters the state q1 and writes B (blank).
From the above moves, we can see that M enters the state q1 if it scans an even
number of α’s, and it enters the state q2 if it scans an odd number of α’s. Hence
q2 is the only accepting state.
Hence, M = {{q1, q2}, {1}, {1, B}, δ, q1, B, {q2}},
where δ is given by:
23
Example 2
Design a Turing Machine that reads a string representing a binary number and
erases all leading 0’s in the string. However, if the string comprises of only 0’s,
it keeps one 0.
Solution
Let us assume that the input string is terminated by a blank symbol, B, at each
end of the string.
The Turing Machine, M, can be constructed by the following moves:
q0 be the initial state.
If M is in q0,on reading 0, it moves right, enters the state q1 and erases 0.
On reading 1, it enters the state q2 and moves right.
If M is in q1, on reading 0, it moves right and erases 0, i.e., it replaces 0’s
by B’s. On reaching the leftmost 1, it enters q2 and moves right. If it
reaches B, i.e., the string comprises of only 0’s, it moves left and enters
the state q3.
If M is in q2, on reading either 0 or 1, it moves right. On reaching B, it
moves left and enters the state q4. This validates that the string comprises
only of 0’s and 1’s.
If M is in q3, it replaces B by 0, moves left and reaches the final state qf.
If M is in q4, on reading either 0 or 1, it moves left. On reaching the
beginning of the string, i.e., when it reads B, it reaches the final state qf.
Hence, M = {{q0, q1, q2, q3, q4, qf}, {0,1, B}, {1, B}, δ, q0, B, {qf}}
where δ is given by:
24
Non-Deterministic Turing Machine
A non-deterministic Turing machine can be formally defined as a tuple (Q, X,
Σ, δ, q0, B, F) where:
Q is a finite set of states
X is the tape alphabet
Σ is the input alphabet
δ is a transition function; δ : Q × X → P(Q × X × {Left_shift, Right_shift}).
q0 is the initial state
B is the blank symbol
F is the set of final states
In a Non-Deterministic Turing Machine, for every state and symbol, there are a
group of actions the TM can have. So, here the transitions are not deterministic.
The computation of a non-deterministic Turing Machine is a tree of
configurations that can be reached from the start configuration. An input is
accepted if there is at least one node of the tree which is an accept
configuration, otherwise it is not accepted. If all branches of the computational
tree halt on all inputs, the non-deterministic Turing Machine is called a Decider
and if for some input, all branches are rejected, the input is also rejected.
A Turing Machine with a semi-infinite tape has a left end but no right end. The
left end is limited with an end marker. It is a two-track tape:
1. Upper track: It represents the cells to the right of the initial head position.
2. Lower track: It represents the cells to the left of the initial head position in
reverse order.
The infinite length input string is initially written on the tape in contiguous tape
cells. The machine starts from the initial state q0 and the head scans from the
left end marker ‘End’. In each step, it reads the symbol on the tape under its
25
head. It writes a new symbol on that tape cell and then it moves the head either
into left or right one tape cell. A transition function determines the actions to be
taken. It has two special states called accept state and reject state. If at any
point in time it enters into the accepted state, the input is accepted and if it
enters into the reject state, the input is rejected by the TM. In some cases, it
continues to run infinitely without being accepted or rejected for some certain
input symbols.
Note: Turing machines with semi-infinite tape are equivalent to standard Turing
machines.
Example
Find out whether the following problem is decidable or not: Is a number ‘m’
prime?
Solution
Prime numbers = {2, 3, 5, 7, 11, 13, …………..}
34. Language Decidability
Divide the number ‘m’ by all the numbers between ‘2’ and ‘√m’ starting from
‘2’. If any of these numbers produce a remainder zero, then it goes to the
26
“Rejected state”, otherwise it goes to the “Accepted state”. So, here the answer
could be made by ‘Yes’ or ‘No’. Hence, it is a decidable problem.
Several attempts were made in the first half of the 20th Century to formalize the
notion of computability:
(i) American mathematician Alonzo Church created a method for defining
functions called the λ-calculus.
(ii) British mathematician Alan Turing created a theoretical model for a
machine, now called a universal Turing machine, that could carry out
calculations from inputs.
(iii) Church, along with mathematician Stephen Kleene and logician J.B. Rosser
created a formal definition of a class of functions whose values could be
calculated by recursion.
All three computational processes (recursion, the λ-calculus, and the Turing
machine) were shown to be equivalent—all three approaches define the same
class of functions. This has led mathematicians and computer scientists to
believe that the concept of computability is accurately characterized by these
28
three equivalent processes. Informally the Church–Turing thesis states
that if some method (algorithm) exists to carry out a calculation, then the same
calculation can also be carried out by a Turing machine (as well as by a
recursively definable function, and by a λ-function).
The thesis can be stated as follows:
• Every effectively calculable function is a computable function.
Turing stated it this way:
• "It was stated ... that 'a function is effectively calculable if its values can be
found by some purely mechanical process.' We may take this literally,
understanding that by a purely mechanical process one which could be carried
out by a machine. The development ... leads to ... an identification of
computability with effective calculability.
Solution
Here the initial state is q2 and the final state is q1.
The equations for the three states q1, q2, and q3 are as follows:
q1 = q1a + q3a + є (є move is because q1 is the initial state0)
q2 = q1b + q2b + q3b
29
q3 = q2a
Now, we will solve these three equations:
q2 = q1b + q2b + q3b
= q1b + q2b + (q2a)b (Substituting value of q3)
= q1b + q2(b + ab)
= q1b (b + ab)* (Applying Arden’s Theorem)
q1 = q1a + q3a + є
= q1a + q2aa + є (Substituting value of q3)
= q1a + q1b(b + ab*)aa + є (Substituting value of q2)
= q1(a + b(b + ab)*aa) + є
= є (a+ b(b + ab)*aa)*
= (a + b(b + ab)*aa)*
Hence, the regular expression is (a + b(b + ab)*aa)*.
Problem 2
Construct a regular expression corresponding to the automata given below:
Solution:
Here the initial state is q1 and the final state is q2
Now we write down the equations:
q1 = q10 + є
30
q2 = q11 + q20
q3 = q21 + q30 + q31
Now, we will solve these three equations:
q1 = є0* [As, εR = R]
So, q1 = 0*
q2 = 0*1 + q20
So, q2 = 0*1(0)* [By Arden’s theorem]
Hence, the regular expression is 0*10*.
Non-determinism means that there may be more than just one transition
available to follow, given an input signal, state, and stack symbol. If in every
situation only one transition is available as continuation of the computation,
then the result is a deterministic pushdown automaton (DPDA), a strictly
weaker device. Unlike finite-state machines, there is no mechanical way to turn
a NDPDA into an equivalent DPDA.
32
If we allow a finite automaton access to two stacks instead of just one, we
obtain a more powerful device, equivalent in power to a Turing machine. A
linear bounded automaton is a device which is more powerful than a pushdown
automaton but less so than a Turing machine.
PDA Transitions:
Second:
δ(q, ε, z) = {(p1,γ1), (p2,γ2),…, (pm,γm)}
– Current state is q
– Current input symbol is not considered
– Symbol currently on top of the stack z
– Move to state pi from q
– Replace z with γi on the stack (leftmost symbol on top)
– No input symbol is read
34
Example: The Figure below shows the graphical representation of the DFA
A = (Q, Σ, δ, q0, F), where
Q = {q0, q1, q2, q3}, Σ = {a, b}, F = {q0}, and δ is given by the following table
δ(q0, a) = q1 δ(q1, a) = q0 δ(q2, a) = q3 δ(q3, a) = q2
δ(q0, b) = q3 δ(q1, b) = q2 δ(q2, b) = q1 δ(q3, b) = q0
The first one is accepting, but the second one is not. The DFA recognizes the
language of all words over the alphabet {a, b} that contain an even number of
a’s and an even number of b’s. The DFA is in the states on the left, respectively
on the right, if it has read an even, respectively an odd, number of a’s. Similarly,
it is in the states at the top, respectively at the bottom, if it has read an even,
respectively an odd, number of b’s.
Trap States
Consider the DFA with a trap state of the figure below over the alphabet {a, b,
c}.
35
The automaton recognizes the language {ab, ab}. The pink state on the left is
often called a trap state or a garbage collector: if a run reaches this state, it gets
trapped in it, and so the run cannot be accepting. DFAs often have a trap state
with many ingoing transitions, and this makes it difficult to find a nice graphical
representation. So when drawing DFAs we often omit the trap state. For
instance, we only draw the black part of the automaton in the Figure. Notice that
no information is lost: if a state q has no outgoing transition labeled by a, then
we know that δ(q, a) = qt, where qt is the trap state.
36
there is an accepting run on input w. The language recognized by A is the set
L(A) = {w ∈ Σ∗ | w is accepted by A}. The runs of NAs are defined as for DAs,
but substituting p0 ∈ Q0 for pi+1 ∈ δ(pi, ai) for δ(pi, ai) = pi+1. Acceptance
and the language recognized by a NA are defined as for DAs. A
nondeterministic finite automaton (NFA) is a NA with a finite set of states.
We often identify the transition function δ of a DA with the set of triples (q, a,
q0) such that q0 = δ(q, a), and the transition relation δ of a NFA with the set of
triples (q, a, q0) such that q0 ∈ δ(q, a); so we often write (q, a, q0) ∈ δ, meaning
q0 = δ(q, a) for a DA, or q0 ∈ δ(q, a) for a NA. If a NA has several initial states,
then its language is the union of the sets of words accepted by runs starting at
each initial state.
Example: The figure below shows a NFA A = (Q, Σ, δ, Q0, F) where Q = {q0,
q1, q2, q3}, Σ = {a, b}, Q0 = {q0}, F = {q3}, and the transition relation δ is
given by the following table
δ(q0, a) = {q1} δ(q1, a) = {q1} δ(q2, a) = ∅ δ(q3, a) = {q3}
δ(q0, b) = ∅ δ(q1, b) = {q1, q2} δ(q2, b) = {q3} δ(q3, b) = {q3}
A has no run for any word starting with a b. It has exactly one run for abb, and
four runs for abbb, namely
37
Two of these runs are accepting, the other two are not. L(A) is the set of words
that start with a and contain two consecutive bs.
PARSING
In computer science and linguistics, parsing, or, more formally, syntactic
analysis, is the process of analysing a text, made of a sequence of tokens (for
example, words), to determine its grammatical structure with respect to a given
(more or less) formal grammar. Parsing is also an earlier term for the
diagramming of sentences of natural languages, and is still used for the
diagramming of inflected languages, such as the Romance languages or Latin.
Parsing is a common term used in psycholinguistics when describing language
comprehension.
In this context, parsing refers to the way that human beings, rather than
computers, analyze a sentence or phrase (in spoken language or text) in terms of
grammatical constituents, identifying the parts of speech, syntactic relations,
38
etc. This term is especially common when discussing what linguistic cues help
speakers to parse garden path sentences.
Types of Parsing
(1) Top Down Parsing
Top-down parsing is a parsing strategy where one first looks at the highest level
of the parse tree and works down the parse tree by using the rewriting rules of a
formal grammar. LL parsers are a type of parser that uses a top-down parsing
strategy. Top-down parsing is a strategy of analysing unknown data
relationships by hypothesizing general parse tree structures and then
considering whether the known fundamental structures are compatible with the
hypothesis. It occurs in the analysis of both natural languages and computer
languages.
aaab
aaaεb (insert ε)
aaaAb A –>ε
aaAb A –>aA
aAb A –>aA
Ab A –>aA
AB B –> b
S S –> AB
40