Lab 3: Using ML-Yacc

Zhong Zhuang dyzz@mail.ustc.edu.cn

How to write a parser?
` `

Write a parser by hand Use a parser generator
` ` `

May not be as efficient as hand-written parser General and robust How it works?
stream of tokens Parser Specification Parser parser generator abstract syntax

special declarations to resolve conflicts %% Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax .ML-Yacc specification ` Three parts again User Declarations: declare values available in the rule actions %% ML-Yacc Definitions: declare terminals and non-terminals.

..ML-Yacc Definitions ` ` specify type of positions %pos int * int specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS . %nonterm prog | exp | op ` ` specify end-of-parse token %eop EOF specify start symbol (by default. non terminal in LHS of first rule) %start prog .

A Simple ML-Yacc File grammar symbols %% %term NUM | PLUS | MUL | LPAR | RPAR %nonterm exp | fact | base %pos int %start exp %eop EOF grammar rules %% exp : fact | fact PLUS exp fact : base | base MUL factor base : NUM | LPAR exp RPAR () () () () () () semantic actions (currently do nothing) .

` ` each nonterminal may have a semantic value associated with it when the parser reduces with (X ::= s) ` ` a semantic action will be executed uses semantic values from symbols in s parser returns semantic value associated with the start symbol usually a syntax tree ` when parsing is completed successfully ` ` .

we must declare symbol types: ` ` %terminal NUM of int | PLUS | MUL | . %nonterminal exp of int | fact of int | base of int ` type of semantic action must match type declared for the nonterminal in rule ...` to use semantic values during parsing.

A Simple ML-Yacc File with Action grammar symbols with type declarations %% %term NUM of int | PLUS | MUL | LPAR | RPAR %nonterm exp of int | fact of int | base of int %pos int %start exp %eop EOF %% exp : fact | fact PLUS exp fact : base | base MUL base base : NUM | LPAR exp RPAR (fact) (fact + exp) (base) (base1 * base2) (NUM) (exp) grammar rules with semantic actions computing integer result via semantic actions .

Conflicts in ML-Yacc ` We often write ambiguous grammar exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR ` Example ` Tokens from lexer ` ` NUM PLUS NUM MUL NUM To be read ` State of Parser ` E+E .

Conflicts in ML-Yacc ` We often write ambiguous grammar exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR ` Example ` Tokens from lexer ` ` If we shift Shift Shift Reduce Reduce E+E* E+E*E E+E E NUM PLUS NUM MUL NUM To be read ` State of Parser ` ` E+E Result is : E+(E*E) .

Conflicts in ML-Yacc ` We often write ambiguous grammar exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR ` Example ` Tokens from lexer ` ` If we reduce Reduce Shift Shift Reduce E E* E*E E NUM PLUS NUM MUL NUM To be read ` State of Parser ` ` E+E Result is: (E+E)*E .

because ´*µ has higher precedence If we shift than ´+µ E+E+ Another shift-reduce conflict Shift ` Tokens from lexer ` ` Shift Reduce Reduce E+E+E E+E E If we reduce NUM PLUS NUM PLUS NUM To be read ` State of Parser ` ` E+E Result is : E+(E+E) and (E+E)+E Reduce Shift Shift Reduce E E+ E+E E .` ` ` This is a shift-reduce conflict We want E+E*E.

%right %nonassoc ` ` rewrite the grammar to eliminate ambiguity ` use Yacc precedence directives ` . because ´+µ is left associative Deal with it! ` let ML-Yacc complain. harder to debug other parts of your grammar. we need to reduce. ` ` default choice is to shift when it encounters a shift-reduce error BAD: programmer intentions unclear. generally inelegant can be complicated and less clear %left.Deal with shift-reduce conflicts ` ` This case.

Precedence and Associativity ` ` precedence of terminal based on order in which associativity is specified precedence of rule is the precedence of the rightmost terminal ` eg: precedence of (E ::= E + E) == prec(+) prec(terminal) > prec(rule) ==> shift prec(terminal) < prec(rule) ==> reduce prec(terminal) = prec(rule) ==> ` ` ` ` a shift-reduce conflict is resolved as follows ` ` ` assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error .

` datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp %% %left PLUS MINUS %left MUL DIV %% exp : NUM | exp PLUS exp | exp MINUS exp | exp MUL exp | exp DIV exp | LPAR exp RPAR (Int NUM) (Add (exp1. exp2)) (Sub (exp1. exp2)) (exp) Higher precedence . exp2)) (Mul (exp1. exp2)) (Div (exp1.

Reduce-reduce Conflict ` ` This kind of conflict is more difficult to deal with Example sequence::= | maybeword | sequence word maybeword: := | word ` When we get a ´wordµ from lexer. ` ` word -> maybeword -> sequence (rule 1) empty ²> sequence word -> sequence (rule 2) ` We have more than one way to get ´sequenceµ from input ´wordµ .

Reduce-reduce Conflict ` ` Reduce-reduce conflict means there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar. ML-Yacc reduce by first rule ` Generally. reduce-reduce conflict is not allowed in your ML-Yacc file sequence::= | sequence word ` We need to fix our grammar .

Summary of conflicts ` Shift-reduce conflict ` ` precedence and associativity Shift by default reduce by first rule Not allowed! ` Reduce-reduce conflict ` ` .

mlb test.sml sources.grm main.lex c.cm lab3.cµ file Output: ´Success!µ if the ´.sml call-main.Lab3 ` ` ` ` Your job is to finish a parser for C language Input: A ´.c .cµ file is correct File description ` ` ` ` ` ` ` c.

lex.cµ.grm mllex c.grm.lexµ In command-line: (use MLton·s) ` ` mlyacc c. or in command-line. lab3 test.cmµ. mlton lab3. ´c.grm.grm. or in command-line. ´c. ´c.grmµ and ´c.lex ´c.sigµ.smlµ Start SML/NJ.c ` ` we will get ` Then compile Lab3 ` ` ` To run lab3 ` ` .mlb In SML/NJ.descµ. Main.Using ML-Yacc ` ` Read the ML-Yacc Manual Run ` ` If your finish ´c.make ´sources. Run CM.smlµ.parse ´test.

` ` mlyacc c. reduce by rule 12 goto 429 goto 1 ` rule 12 means the 12th rule (from 0) in your ML-Yacc file .´Debugµ ML-Yacc File ` When you run mlyacc. reduce by rule 12) state 0: open file ´c. reduce by rule 12) error: state 1: shift/reduce conflict (shift MYSTRUCT. For example. structs vdecs preds funcs MYSTRUCT shift 3 prog structs goto 2 structdec .grm ` 2 shift/reduce conflicts The beginning of this file 2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT.grm. you·ll see error messages if your ml-yacc file has conflicts.descµ(This file is generated by mlyacc) ` ` the rest are all the states prog : .

Use ML-lex with ML-yacc ` ` ` Most of the work in ´c.sigµ signature C_TOKENS = sig type ('a.'a) token end .grm.grmµ will be automatically in ´c.'a) token val INT: (int) * 'a * 'a -> (svalue.grmµ ` ` %term INT of int | EOF ´%termµ in ´c.'b) token type svalue val EOF: 'a * 'a -> (svalue.lexµ this time can be copied from Lab2 You can re-use Regular expressions and Lexical rules Difference with Lab2 ` You have to define ´tokenµ in ´c.

Hints ` ` ` Read ML-Yacc Manual Read the language specification Test a lot! .

Sign up to vote on this title
UsefulNot useful