This action might not be possible to undo. Are you sure you want to continue?
Parsing is a process to determine how a string might be derived using productions (rewrite rules) of a given grammar. It can be used to check whether or not a string belongs to a given language. When a statement written in a programming language is input, it is parsed by a compiler to check whether or not it is syntactically correct and to extract components if it is correct. Finding an efficient parser is a nontrivial problem and a great deal of research has been conducted on parser design. In this note basic parsing techniques are introduced with examples and some of the problems involved in parsing are discussed together with brief explanations of some of the solutions to those problems Two basic approaches to parsing are top-down parsing and bottom-up parsing. In the top-down approach, a parser tries to derive the given string from the start symbol by rewriting nonterminals one by one using productions. The nonterminal on the left hand side of a production is replaced by it right hand side in the string being parsed. In the bottom-up approach, a parser tries to reduce the given string to the start symbol step by step using productions. The right hand side of a production found in the string being parsed is replaced by its left hand side. Let us see how string aababaa might be parsed by these two approaches for the following grammar as an example:
This grammar generates the palindromes of odd lengths. Top-down approach proceeds as follows: (1) The start symbol is pushed into the stack without reading any input symbol. (2) is popped and is pushed without reading any input symbol. (3) As the first a in the input is read, a at the top of the stack is popped. (4) is popped and is pushed without reading any input symbol.
pop the nonterminal on the left hand side of the production at the top of the stack and push its right hand side string. Since the stack is empty when the entire input string is read. (7) The last a is read and pushed into the stack. (9) As the unread input symbols are read. b at the top of the stack is popped. . (6) is popped and is pushed without reading any input symbol. that is. the string is found to be in the language. (4) at the top of the stack is replaced by . (unread portion of input. (8) at the top of the stack is replaced by . (3) The second b is read and pushed into the stack. (c) Initially push the start symbol into the stack. (2) The middle a is replaced by at the top of the stack without reading any input symbol. If we use configuration without state. (b) Pop the stack if the top of the stack matches the input symbol being read.(5) As the second a in the input is read. Bottom-up approach proceeds as follows: (1) The string aababaa is read into the stack one by one from left until the middle a is reached. (6) at the top of the stack is replaced by . (8) is popped and is pushed without reading any input symbol. this top-down parsing can be expressed as follows: In general a PDA for top-down parsing has the following four types of transitions: (a) For each production. (7) As the first b in the input is read. abaa in the stack is popped one by one. (d) Go to the final state if the entire input has been read and the stack is empty. a at the top of the stack is popped. stack contents). (5) The fourth a is read and pushed into the stack.
this is called shift.Since the stack has the language. (d) If the entire input has been read and only the start symbols is in the stack. when the entire input string is read. the string is found to be in If we use configuration without state. The children of a node are the symbols appearing on the right hand side of the production used to rewrite the nonterminal corresponding to the node in the derivation. Its internal nodes correspond to the nonterminals that appear in the derivation.this is called reduce. . this bottom-up parsing can be expressed as follows: Note that the rightmost symbol on the right hand side of a production appears at the top of the stack. For example the following figure shows the parse tree of the string aababaa of the above example. (b) Replace the right hand side of a production at the top of the stack with its left hand side -. A parse tree has the start symbol at its root. The structure of a derivation of a string can be represented by a tree called parse tree or a derivation tree. then pop the stack and go to the final state. (c) Pop the stack if the top of the stack matches the input symbol being read. In general a PDA for bottom-up parsing has the following four types of transitions: (a) Push input symbol being read into the stack -.
Top-down parsing: . Example 2 for top-down and bottom-up parsing: Given the grammar parse the expression a + b*c. . while the bottom-up parsing goes from the leaves up to the root. . let us Bottom-up parsing: .The top-down parsing traverses this tree from the root down to the leaves.
if we change this to the following grammar which is equivalent to this grammar. For example in the parsing of aababaa discussed above. Difficulties in parsing The main difficulty in parsing is nondeterminism. . this nondeterminism disappears: . namely and . after is replaced by in the first step. though not all of them lead to the desired string. it is impossible to tell when to apply the production with the same information as for the top-down parsing. Factoring: Consider the following grammar: . Below several of the difficulties are briefly discussed. That is. Similarly for the bottom-up parsing. and one can not tell which one to use until after the entire string is generated. . Some of these nondeterminisms are due to the particular grammar being used and they can be removed by transforming grammars to other equivalent grammars while others are the nature of the language the string belongs to.Note again that the rightmost symbol on the right hand side of a production appears at the top of the stack. it is not possible to decide which one to choose with the information of the input symbol being read and the top of the stack. With this grammar when string aababaa is parsed top-down. there are two applicable productions. However. . at some point in the derivation of a string more than one productions are applicable. there is no easy way of telling which production to use to rewrite next. However. when is at the top of the stack and a is read in the top-down parsing. .
However. When a string. Left-recursions can be removed by replacing left-recursive pairs of productions with new pairs of productions as follows: If with are left-recursive productions. . parse trees).This transformation operation is called factoring as a on the right hand side of productions for in the original grammar are factored out as see n in the new grammar. produces : the following two derivations for the expression . say aaba. it needs to be replaced by the right hand side of some production. the grammar . For example the left-recursive grammar given above can be transformed to the following non-recursive grammar: . where . then replace them with and 's don't start . after is pushed into the stack. where represents an identifier. Left-recursion: Consider the following grammar: . For example. Ambiguous grammar : A context-free grammar is called ambiguous if there is at least one string that has more than one distinct derivations (or. there is no simple way of telling which production to use and a parser may go into infinite loop especially if it is given an illegal string (a string which is not in the language). is parsed top-down for this grammar. This kind of grammar is called left-recursive. equivalently.
for the language of algebraic expressions given above. the middle of a given string must be identified. Nondeterministic language : Lastly there are context-free languages that can not be parsed by a deterministic PDA. This kind of languages need nondeterministic PDAs. But it can be shown that no deterministic PDA can do that. the following grammar is unambiguous: . . When parsing strings for this language. which corresponds to and . which corresponds to . Though some context-free languages are inherently ambiguous and no unambiguous grammars can be constructed for them. it is often possible to construct unambiguous context-free grammars for unambiguous context-free languages. For example. .. . For example take the language of palindromes. Hence guess work is necessary in selecting the right production at certain steps of their derivation.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.