This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

the operation mentioned. The theorems are of the form “if certain languages are regular, and a language L is formed from them by certain operation such as union, intersection etc. then L is also regular”. In general closure properties convey the fact that when one (or several) languages are regular, then certain related languages are also regular. The principal closure properties of regular languages are: 1.The union of two regular languages is regular. If L and M are regular languages, then so is L ∪ M. 2. The intersection of two regular languages is regular. If L and M are regular languages, then so is L ∩ M. 3. The compliment of two regular languages is regular. If L is a regular language over alphabet Σ, then Σ*-L is also regular language. 4. The difference of two regular languages is regular. If L and M are regular languages, then so is L - M. 5. The reversal of a regular language is regular. The reversal of a string means that the string is written backward, i.e. reversal of abcde is edcba. The reversal of a language is the language consisting of reversal of all its strings, i.e. if L={001,110} then L® = {100,011}. 6.The closure of a regular language is regular. If L is a regular language, then so is L*. 7. The concatenation of regular languages is regular. If L and M are regular languages, then so is L M. 8.The homomorphism of a regular language is regular. A homomorphism is a substitution of strings for symbol. Let the function h be defined by h(0) = a and h(1) = b then h applied to 0011 is simply aabb.

and h is a homomorphism on Σ. h (w) = h (a) h Homomorphism applied in forward direction. The theorem states that “ If h is a homomorphism from alphabet Σ to alphabet T . then h (L) = { h(w) | w is in L } The theorem can be stated as “ If L is a regular language over alphabet Σ. and h is a homomorphism on Σ. y ∈ Σ* A homomorphism can also be applied to a language by applying it to each of strings in the language. then h′(L) is also a regular language. then h(L) is also regular ” . Let L be a language over alphabet Σ. 9. and L is aregular language on T . h′ (L) is set of strings w in Σ* such that h(w) is in L. The inverse homomorphism of two regular languages is regular.If h is a homomorphism on alphabet Σ and a string of symbols w = abcd…z then (b) h(c) h (d)…h (z) The mathematical definition for homomorphism is h: Σ*→Γ* such that ∀ x. . Suppose h be a homomorphism from some alphabet Σ to strings in another alphabet Τ and L be a language over Τ then h inverse of L.

xz is in L exactly when yz is in L.) 2) L is the union of some of the equivalence classes of a right invariant (with respect to concatenation) equivalence relation of finite index. R=(R1) then L(R) is empty if and only if both L(R1) is empty since they are the same language.w)∈ F where the regular language is represented by the finite automata {Q. .∑. (We know this means L is a regular language. R=R1R2 then L(R) is empty if and only if either L(R1) or L(R2) is empty. Then RL is of finite index. it always includes at least ∈. There are four cases to consider.δ.F}. If δ^ (q0.Homomorphism applied in inverse direction. To check whether the language accepts the string. Testing Membership in a Regular Language We have a regular language and an input string. The Myhill-Nerode Theorem says the following three statements are equivalent: 1) The set L ⊆ ∑∗ is accepted by some FA. 3) Let equivalence relation RL be defined by: xRLy if and only if for all z in ∑∗. corresponding to the ways that R could be constructed. We know for every regular language L there is a machine M that exactly accepts the strings in L. -------------------------------Myhill-Nerode Theorem Myhill-Nerode theorem and minimization to eliminate useless states. q0. The notation RM means an equivalence relation R over a machine M. R=R1+R2 then L(R) is empty if and only if both L(R1) and L(R2) are empty. The notation RL means an equivalence relation R over the language L. R=R1* then L(R) is not empty. ----------------------------------------------Decision Properties of Regular languages Testing Emptiness of Regular Languages Suppose R is a regular expression.

Think of an equivalence relation as being true or false for a specific pair of strings x and y. Thus xRy is true for some set of pairs x and y. Do not write over dashes. ∑. y and z strings in ∑∗. To get RL from this we have to consider only the Final reachable states of M. From this theorem comes the provable statement that there is a smallest. s) respectively. xRy and yRz implies xRz. by applying w to states (p. F) as usual Remove from Q. RM divides the set ∑∗ into equivalence classes. Apply it starting from any two states p and q and after processing the string w if either p or q reaches any entries of F then put an “X” in the table at (p. y). q) in the matrix means states p and q are distinct in the minimum machine. 2) Take a string w (preferably a string with a single character or ∈). if any. 3) After all (p. An X or x at (p. q). Now we can say that the states p and q are distinguishable. q0. We will use a relation R such that xRy <=> yRx x has a relation to y if and only if y has the same relation to x. are then eliminated and the algorithm proceeds. δ. These matrix locations will never change. q)’s are checked follow the recursive rule. xRx is true. Remember a DFA is a directed graph with states as nodes. Our RL is defined xRLy <=> for all z in ∑∗ (xz in L <=> yz in L) Our RM is defined xRMy <=> xzRMyz for all z in ∑∗. In other words δ(q0. x). The unreachable states. fewest number of states. Thus use a depth first search to mark all the reachable states. q) if we reach (r. This is known as transitive. q). xz) = δ(δ(q0. This is known as symmetric. z) = δ(δ(q0. one class for each state reachable in M from the starting state q0. ----------------------------Table filling algorithm Start with a machine M = (Q. 1) For p in F and q in Q-F put an "X" in the table at (p. This is the initialization step. . FA for every regular language. yz) for x. This is known as reflexive. F and delta all states that cannot be reached from q0. z) = δ(q0.

q2. (q3... (q6. q1... (q6. q7. q4).. then put an “X” in the table at (p.q) such that p in F and q in (Q-F) { (q2.q2 q3 q4 q5 q6 q7 Now fill in for step 1) (p. q0).. q3. q).. (q3. q8} ∑ = {a.. q0). q4. b} q0 = q0 F = {q2.. q5. (q2.. q4). q7).. . q7).. q1).. (q5. q7. q1). q1. q8)} . (q6. (q5. (q3. s) were earlier proved to be distinguishable. (q6. q0). q3.. q1. q6} note Q-F = {q0. (q2.. but if (r. (q6. q7).... (q2.q7 and labeling the "q" columns q1. q7). q8} δ q0 q1 q2 q3 q4 q5 q6 q7 q8 a q1 q2 q7 q8 q5 q7 q7 q7 q8 b q4 q3 q8 q7 q6 q8 q8 q7 q8 Now.q8 q1 q2 q3 q4 q5 q6 q7 q8 q0 q1 . q4). q8). Q = {q0. (q3. (q5.both r and s ∉ F. build the table labeling the "p" rows q0. q1). q8). (q5. q4). q0). q6. (q2. q1). (q3. q4. q5. q8). . (q5.

q8} a {q1.X X X .X X . q3).q6}.q1 q2 q3 q4 q5 q6 q7 q8 X X X X - X X X X - X X X X .q8} {q7.q8} }four states F' = { {q2. δ. a)) so r=q1 and s= q2 Note that (q1. q4) is a state in the minimum machine The "O" for (q2. The resulting minimum machine is M' = (Q′.q6} }and only one final state q0' = q0 δ′ {q0} {q1. {q1.q6} {q7.q0'. Many other "O" just confirm this. q8} is one state in the minimum machine.X .q3.q6} {q7.q4} {q2. q4) means {q1. {q2.q3.q8} {q7. a). q5. (q2.q5. q8) means {q7.q3. thus (q0. q3. q2) has an X.q5.q5.q5. For example (r. δ(q=q1.q4} {q2. The "O" in (q7.q4} {q2. s) = (δ(p=q0.F') with Q' = { {q0}.q6} {q7. q6) means they are one state {q2.q4}.X X q0 q1 q2 q3 q4 q5 q6 q7 Now fill in more x's by checking all the cases in step 2 and apply steps 3. q5) and (q2. q1) gets an "x" q1 q2 q3 q4 q5 q6 q7 q8 x X X x X X x x X X 0 X X x x 0 X 0 0 X X X 0 X 0 X 0 X x X X X x X X 0 q0 q1 q2 q3 q4 q5 q6 q7 The "O" at (q1.q5.∑. {q7. q6} in the minimum machine.q8} . Finish by filling in blank table locations with "O".q3.q3.q8} B {q1.

Note: Fill in the first column of states first. q) and (r. marked X. Check that every state occurs in some set and in only one set. If they are equivalent we can conclude that the regular expressions are also equivalent. Now check the two start states using table-filling algorithm. then r is distinguishable from s. but q1 is now {q1. For the pairs of states (p. s) with an x. At the heart of the algorithm is the following: The sets Q-F and F are disjoint. For this first convert each regular expression to a DFA.q4}. Since this is a DFA the next columns must use exactly the state names found in the first column. The following example makes the concept clear: Consider the following DFA’s having the regular expressions .g. s) where r= δ(p. This DFA has two start states. we can take any one out of this as the start state of new DFA. thus mark (r. a) if p is distinguishable from q. thus the pairs of states (Q-F) X (F) are distinguishable. Testing equivalence of regular languages To test the equivalence of two regular languages we can make use of the tablefilling algorithm. a) and s= δ(q. q0 with input "a" goes to q1. Imagine a DFA whose states are the union of states of the DFA obtained from regular expressions. e.

Applying table-filling algorithm we get: It shows that the two start states A and C are equivalent and so we reach in the conclusion that these two DFA’s accept the same language or the regular languages are equivalent. Minimisation of Finite State Automata Algorithm . Can we apply table-filling algorithm to minimize all NFA’s? This can be easily concluded using following example: Applying the table-filling algorithm: The state C is a redundant state.Let’s imagine that this represent a single DFA. but it cannot be concluded from the table. with states A to E (A is taken as start state). So the NFA cannot be minimized using table-filling algorithm.

however. Also. The special characters in UNIX regular language can be represented as characters using \ symbol i. i.0-9]. 5. 2. j ′) is TRUE if one of the states i ′. 1. For example. Note. Otherwise.) is to represent ‘any character’.join together the indistinguishable states ---------------------------Applications of Regular Expression 1. Thus [[\]] matches either [ or ]. the entry (i ′. . The endpoints of a range may be specified in either order (i. Relabel the nodes 1. \ provides the usual escapes within character class brackets. The regular expression a+b+c+…+z is represented by [abc…z] Within a character class representation. that useless and unreachable states would first have to be removed for the algorithm to work. Special notations [: digit : ] [: alpha:] same as [0-9] same as [A-Za-z] . j are final or both non-final states).j) is FALSE . n (where n is therefore the number of states). a-z defines the set of lower-case letters and A-Z defines the set of upper-case letters... because \ causes the first ] in the character class representation to be taken as a normal character rather than the closing bracket of the representation. 3. If our expression involves operators such as minus then we can place it first or last to avoid confusion with the range specifier.e. j ′) remains marked FALSE. In this we can write character classes (A character class is a pattern that defines a set of characters and matches exactly one character from that set. 4.Regular expressions in Unix In the UNIX operating system various commands use an extended regular expressions language that provide shorthands for many common expressions. . First make the FSA total (see the relevant algorithm). There are some rules for forming this character classes: The dot symbol (.) to represent large set of characters. both 0-9 and 90 define the set of digits). The entry is FALSE otherwise (both i. Construct a table such that the entry (i. We now know that state i is indistinguishable from state j if and only if (i.e. j ′is final while other is not. 2.The following algorithm generates a total FSA equivalent to the one we start off with but with the least possible number of states. Use recursive rule whenever necessary.j) is TRUE if one of the states i.can be used to define a set of characters in terms of a range. the finite state automaton must be deterministic. Proceed with the table construction if there is an input a such that starting from state i with input a takes us to a state i' and stating from state j with input a takes us to a state j' such that (i ′.e. [-. j is final while the other is not.

’ can be described as [^\n]. which match the same number of characters. Using this notation.[: alnum :] Operators | ? + {n} same as [A-Za-z0-9] Used in place of + 0 or 1 of R? Means 0 or 1 occurrence of R 1 or more of R+ means 1 or more occurrence of R n copies of R {3} means RRR Compliment of If the first character after the opening bracket of a character class is ^. b. the rule given first is preferred. a choice is made as follows: 1. If ^ appears as any character of a class except the first. or c but [a^bc] or [abc^] matches a. The longest match is preferred. the character class represented by ‘. or tokens Syntax Analysis: Converts a sequence of tokens into a parse tree Semantic Analysis: Manipulates parse tree to verify symbol and type information Intermediate Code Generation: Converts parse tree into a sequence of intermediate code instructions .Lexical analysis Compilers – in a nutshell Purpose: translate a program in some language (the source language) into a lower-level language (the target language). c or ^. ^ When more than one expression can match the current character sequence. Thus [^abc] matches any character except a. Phases: Lexical Analysis: Converts a sequence of characters into words. 2. the set defined by the remainder of the class is complemented with respect to the computer's character set. b. Among rules. 2. it is not considered to be an operator.

g. Suppose the string else matches for regular expression as well as the expression for identifiers. e.Optimization: Manipulates intermediate code to produce a more efficient program Final Code Generation: Translates intermediate code into final (machine/assembly) code Overview of Lexical Analysis • • • • • Convert character sequence into tokens. This problem is resolved by giving priority to first expression listed. IDENTIFIER=[a-zA-Z][a-zA-Z0-9]* Lexical Analyzers are implemented by regular expressions. There is a problem that more than one token may be recognized at once. skip comments & whitespace Handle lexical errors Efficiency is crucial Tokens are specified as regular expressions. .

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd