You are on page 1of 11

THE CHOMSKY HIERARCHY (Reading: Section 12.

7) The Chomsky hierarchy, named for linguist Noam Chomsky who studied formal grammars, describes a hierarchy of languages, ranging from the very simple regular languages up through the very general recursively enumerable languages. The hierarchy of languages can be described as a hierarchy of languages (sets), or a hierarchy of grammars, or a hierarchy of machines Recursively Enumerable Languages Context-Sensitive Languages Context-Free Languages Regular Languages

Copyright 2010 Nancy Lynn Tinkham

Chomsky Hierarchy
Languages Recursively enumerable languages Context-Sensitive Languages Grammars Type 0: Unrestricted grammars Type 1: Context-Sensitive Grammars Machines Turing Machines (deterministic or nondeterministic) Linear-Bounded Automata (nondeterministic) Pushdown Automata (nondeterministic) Finite Automata (deterministic or nondeterministic)

Context-Free Languages

Type 2: Context-Free Grammars Type 3: Right-Linear Grammars

Regular Languages

Copyright 2010 Nancy Lynn Tinkham

Chomsky Hierarchy: Grammars Notation: : Set of terminals : Set of nonterminals (variables) Type 0: Phrase-Structure Grammars (Unrestricted Grammars) Productions : , ( )* Type 1: Context-Sensitive Grammars Productions : , ( )* Restriction: || || Type 2: Context-Free Grammars Productions A : A ( )* Type 3: Right-Linear Grammars Productions A : A is either w or wB, where w * B Alternative formulation: xAz xyz A x, y, z ( )*, y

Copyright 2010 Nancy Lynn Tinkham

Chomsky Hierarchy: Machines Turing Machines (deterministic or nondeterministic) Memory: A machine state An infinitely long tape Linear-Bounded Automata Memory: A machine state A tape the same length as the input (+ 2 endmarkers) Pushdown Automata Memory: A machine state A stack Finite Automata Memory: A machine state

Copyright 2010 Nancy Lynn Tinkham

Context-Sensitive Languages A note on CSGs and : Because of the restriction on increasing length in grammar rules, CSGs cannot generate . Thus, strictly speaking, L is a Context-Sensitive Languages if there is a CSG G such that either L = L(G) or L = L(G) {}. Example: Context-Sensitive Grammar for {anbncn | n > 0} S aAbc | abc A aAbC | abC Cb bC Cc cc Sample derivation of aaabbbccc: S aAbc aaAbCbc aaabCbCbc aaabbCCbc aaabbCbCc aaabbbCCc aaabbbCcc aaabbbccc Example: Linear-bounded Automaton algorithm to accept {anbncn | n 0} Repeatedly: Replace leftmost a by X Replace leftmost b by Y Replace leftmost c by Z Until no more as, bs, cs remain. If this can be done, erase all Xs, Ys, and Zs, write 1, and halt. (Otherwise, halt early with garbage on tape.) This doesnt require any extra tape space beyond space for input.
Copyright 2010 Nancy Lynn Tinkham

S and A generate abc triples. (Each C will become a c eventually.) Cb and Cc rules move cs to the end of the string.

All Context-Sensitive Languages are recursive (Turing-recognizable): It is possible for an LBA computation (or a CSG derivation) to go on forever without halting. However, there is an algorithm to determine whether a given string w can be generated by a given CSG: Construct a tree containing all possible derivations of the grammar: S is the root. The children of S are the strings that can be derived from S in one rule application. The children of Ss children are the strings that can be derived from Ss children in one rule application. Etc. Cut off a tree branch at string x (dont generate children of x) if: x is the input string w (in which case, accept); or x is identical to one of xs ancestors (in this case, any string derivable from x can be derived from xs ancestor); or |x| > |w| ; or No rule is applicable to x This tree is finite.
Why? All nodes in the tree contain strings over (a finite alphabet) of length |w|. There are only finitely many of these (at most, | | 0 + | | 1 + | |2 + + | | |w| ). A node might be the child of several parents, but the tree has a finite branching factor, and a path has a maximum length of | | |w| .

If w appears as a leaf in this tree, accept; else reject.

Copyright 2010 Nancy Lynn Tinkham

Illustration of tree of derivations in previous CSL proof Sample grammar: S aSb | ab ab ba ba ab Deriving string: abab Tree of derivations:
S aSb aaSbb (too long) aabb abab (matches) ab ba ab (duplicates ancestor)

Copyright 2010 Nancy Lynn Tinkham

Right-Linear Grammars Example: Right-Linear Grammar for a*b(ab)* S aS S bA A abA A Sample derivation for aababab: S aA aaA aabB aababB aabababB aababab Sample derivation for b: S bB b Example: Right-Linear Grammar for (ab*a)* S aB | B bB B aS Sample derivation for abaabba: S aB abB abaS abaaB abaabB abaabbB abaabbaS abaabba Sample derivation for aa: S aB aaS aa Note: The variables in a right-linear grammar correspond approximately to the states in a FA.

Copyright 2010 Nancy Lynn Tinkham

Determinism and Nondeterminism For which language families does nondeterminism make a difference? Machine Languages of deterministic machines = Languages of nondeterministic machines? Yes Unknown No Yes

Turing Machine LBA PDA FA

Copyright 2010 Nancy Lynn Tinkham

Closure Properties Closed under: Union R.E. languages Recursive languages CSLs CFLs Regular languages Yes Yes Yes Yes Yes Intersection Yes Yes Yes No Yes Complement No Yes Yes No Yes

Copyright 2010 Nancy Lynn Tinkham

Closure Justification Results for recursive and r.e. languages proved in earlier chapter. Union, for Context-Free Grammar (and Right-Linear Grammar): Rename variables if necessary so no variable occurs in both. Add S S1 | S2 to the grammar. Union, for CSG: Replace each terminal a on the LHS of a production with a variable Xa, and add production Xa a to the grammar. Rename variables if necessary so no variable occurs in both. Add S S1 | S2 to the grammar. Intersection, for LBA: Simulate LBAs M1 and M2 on a 2-track LBA one track for M1, one for M2. (2 tracks, so that we still have access to input string after M1 simulation has ended.) (Complement for LBA is a complicated construction we'll skip it.) Non-intersection for CFLs: requires Pumping Lemma. Non-complement for CFLs follows from non-intersection.
Why? L1 L2 = (L1C L2C)C

Complement of Regular Languages: In FA, make all accepting states non-accepting, and vice versa. Intersection of Regular Languages follows from complement and union.
As above, L1 L2 = (L1C L2C)C

Copyright 2010 Nancy Lynn Tinkham