Introduction to JavaCC

Cheng-Chia Chen

1

What is a parser generator
T o t a l = p r i c e + t a x ;

Scanner

Total

= price

+ tax

;

assignment Total = id price Expr + id tax

Parser

Parser generator (JavaCC) lexical+grammar specification
2

JavaCC
‡ JavaCC (Java Compiler Compiler) is a scanner and parser generator; ‡ Produce a scanner and/or a parser written in java, itself is also written in Java; ‡ There are many parser generators.
± yacc (Yet Another Compiler-Compiler) for C programming language (dragon book chapter 4.9); ± Bison from gnu.org

‡ There are also many parser generators written in Java
± JavaCUP; ± ANTLR; ± SableCC

3

More on classification of java parser generators
‡ Bottom up Parser Generators Tools
± JavaCUP; ± jay, YACC for Java www.inf.uos.de/bernd/jay ± SableCC, The Sable Compiler Compiler www.sablecc.org

‡ Topdown Parser Generators Tools
± ANTLR, Another Tool for Language Recognition www.antlr.org ± JavaCC, Java Compiler Compiler javacc.dev.java.net

4

Features of JavaCC ‡ TopDown LL(K) parser genrator ‡ Lexical and grammar specifications in one file ‡ Tree Building preprocessor ± with JJTree ‡ Extreme Customizable ± many different options selectable ‡ Document Generation ± by using JJDoc ‡ Internationalized ± can handle full unicode ‡ Syntactic and Semantic lookahead 5 .

‡ ‡ ‡ ‡ ‡ Lexical states and lexical actions Case-sensitive/insensitive lexical analysis Extensive debugging capability Special tokens Very good error reporting 6 .Features of JavaCC (cont¶d) ‡ Permits extneded BNF specifications ± can use | * ? + () at RHS.

.dev. .zip to a directory %JCC_HOME% ‡ add %JCC_HOME\ directory to your %path%.JavaCC Installation ‡ Download the file from https://javacc.java.X. directly from the command line. are now invokable 7 .net/ ‡ unzip javacc-4.

Steps to use JavaCC ‡ Write a javaCC specification (. UseParser.jj file) ± Defines the grammar and actions in a file (say.jj) ‡ Run javaCC to generate a scanner and a parser ± javacc calc. *.java ‡ Compile and run your program ± javac -classpath . scanner. mainpackage. token.java ± java -cp .jj ± Will generate parser.MainClass 8 .« java sources ‡ Write your program that uses the parser ± For example. calc.

jj ‡ Example ± ± ± ± % all strings ending in "ab" (a|b)*ab. 9 .4) determine whether it matches the regular expression (line 2). aba. ababb. ‡ Our tasks: ± For each input string (Line 3.Example 1: parse a spec of regular expressions and match it with input strings ‡ Grammar : re.

ab.jj 10 . REParserTo kenManager tokens REParser MainClass javaCC result re.the overall picture % comment (a|b)*ab. a.

Format of a JavaCC input Grammar ‡ javacc_options ‡ PARSER_BEGIN ( <IDENTIFIER>1 ) java_compilation_unit PARSER_END ( <IDENTIFIER>2 ) ‡ ( production )* 11 .

} 12 . BUILD_TOKEN_MANAGER=true. OUTPUT_DIRECTORY=". STATIC=false.jj) options { USER_TOKEN_MANAGER=false./reparser".the input spec file (re.

13 .

import java. } public static void main(String args[]) throws Exception { REParser reparser = new REParser(System. } } PARSER_END(REParser) 14 .in). // output error message with current line number public static void msg(String s) { System.out.*.S(). reparser.re.*. « import dfa.jj PARSER_BEGIN(REParser) package reparser.lang. public class REParser { public FA tg = new FA().println("ERROR"+s).

³ > } SKIP: { < ( [" ". } } ["0"-"9".out."\n"."\f"] )+ > | < "%" ( ~ ["\n"] )* "\n" > { System.println(image)."A"-"Z"] > "epsilon" > "(³ > ")³ > 15 ."a"-"z"."\r".re.jj (Token definition) TOKEN : { <SYMBOL: | <EPSILON: | <LPAREN: | <RPAREN: | <OR: "|" > | <STAR: "*³ > | <SEMI: "."\t".

print().out.out.println("------DFA").re. tg. System. tg. . tg=tg.out.NFAtoDFA().print().renumber(). System. } testCases() } 16 tg.print().println("------Minimize").jj (productions) void S() : { FA d1. tg. tg = tg.println("------Renumber").minimize().println("------NFA"). System.out.print(). System.println("------Execute").out. System. tg = tg. } { d1 = R() <SEMI> { tg = d1.

image) .execute( testInput) . } } String symbols() : {Token token = null.append( token. StringBuffer result = new StringBuffer(). } { ( token = <SYMBOL> { result.toString().} { testInput = symbols() <SEMI> { tg.jj void testCases() : {} { (testCase() )+ } void testCase(): { String testInput .re. } )* { return result. } } 17 .

jj (regular expression) // R --> RUnit | RConcat | RChoice FA R() : {FA result . } ) { return result .symbol( d1.image ). } { ( <LPAREN> result = RChoice() <RPAREN> | <EPSILON> { result = tg. } | d1 = <SYMBOL> { result = tg. Token d1.} { result = RChoice() { return result. } } FA RUnit() : { FA result . } } 18 .epsilon().re.

} { result = RUnit() ( <STAR> { result = result.} {return result .} { result = RConcat() ( <OR> temp = RConcat() { result = result.choice( temp ) . { return result.concat( temp ) . temp . } } )* )* } )* 19 .re. } } FA RConcat() : { FA result.} {return result .jj FA RChoice() : { FA result.} { result = RStar() ( temp = RStar() { result = result. } } FA RStar() : {FA result. temp .closure().

. consisting of the literal itself.Format of a JavaCC input Grammar javacc_input ::= javacc_options PARSER_BEGIN ( <IDENTIFIER>1 ) java_compilation_unit PARSER_END ( <IDENTIFIER>2 ) ( production )* <EOF> color usage: ± ± ± ± ± blue --.) black -.token lexeme ( reserved word.e.nonterminal <orange> ± a token type purple --.meta symbols 20 . I.

‡ <IDENTIFIER>1 must = <IDENTIFIER>2 ‡ java_compilation_unit is any java code that as a whole can appear legally in a file. import myotherpackage«.Notes ‡ <IDENTIFIER> means any Java identifers like var. class2. ± must contain a main class declaration with the same name as <IDENTIFIER>1 . public class MyParser { « } class MyOtherUsefulClass { « } « PARSER_END (MyParser) 21 .. ‡ Ex: PARSER_BEGIN ( MyParser ) package mypackage. « ± IDENTIFIER means IDENTIFIER only.

java PARSER_BEGIN ( MyParser ) package mypackage..java MyParserTokenManager. import myotherpackage«.java 22 . public class MyParser { « } class MyOtherUsefulClass { « } « PARSER_END (MyParser) « javacc ParseException.java ) Token.The input and output of javacc (MyLangSpec.java MyParserCostant.jj TokenMgrError.java MyParser.

‡ import declarations in *.java ‡ The parser file has contents: «class MyParser { « // « 23 .Notes: ‡ Token.java are the same for all input and can be reused. ‡ package declaration in *.jj are copied to the parser and token manager files.java and ParseException.jj are copied to all 3 outputs. ‡ parser file is assigned the file name <IDENTIFIER>1 .

Lexical Specification with JavaCC 24 .

OUTPUT_DIRECTORY=". ± where <IDENTIFIER>3 is not case-sensitive. BUILD_TOKEN_MANAGER=false. LOOKAHEAD=2. STATIC=false.javacc options javacc_options ::= [ options { ( option_binding )* } ] ‡ option_binding are of the form : ± <IDENTIFIER>3 = <java_literal> . ‡ Ex: options { USER_TOKEN_MANAGER=true./sax2jcc/personnel". } 25 .

More Options ‡ LOOKAHEAD ± java_integer_literal ± java_integer_literal (1) (2) for A | B « | C ‡ CHOICE_AMBIGUITY_CHECK ‡ OTHER_AMBIGUITY_CHECK ± java_integer_literal (1) for (A)*. (A)+ and (A)? ‡ ‡ ‡ ‡ ‡ STATIC (true) DEBUG_PARSER (false) DEBUG_LOOKAHEAD (false) DEBUG_TOKEN_MANAGER (false) OPTIMIZE_TOKEN_MANAGER ± java_boolean_literal (false) ‡ OUTPUT_DIRECTORY (current directory) ‡ ERROR_REPORTING (true) 26 .

More Options ‡ JAVA_UNICODE_ESCAPE (false) ± replace \u2245 to actual unicode (6 char 1 char) ‡ UNICODE_INPUT (false) ± input strearm is in unicode form ‡ IGNORE_CASE (false) ‡ USER_TOKEN_MANAGER (false) ± generate TokenManager interface for user¶s own scanner ‡ USER_CHAR_STREAM (false) ± generate CharStream.java interface for user¶s own inputStream ‡ BUILD_PARSER (true) ± java_boolean_literal ‡ ‡ ‡ ‡ BUILD_TOKEN_MANAGER (true) SANITY_CHECK (true) FORCE_LA_CHECK (false) COMMON_TOKEN_ACTION (false) ± invoke ‡ CACHE_TOKENS (false) 27 .

4. 5. WS error 28 .´[0-9]+) (³--´[a-z]*´\n´) | (³ ³|´\n´ | ³\t´ )+ .´ ( [³0´-´9´] ) * | ([³0´-´9´])* ´. ‡ 1.Example: Figure 2. 3. if [a-z][a-z0-9]* [0-9]+ ([0-9]+´.´ ([³0´-´9´])+ IF ID NUM REAL nonToken. 2.´[0-9]*) | ([0-9]*´.2 1. 2. 3.´0´-´9´])* ([³0´-´9´])+ ([³0´-´9´])+ ³. 4. javacc notations ³if´ or ³i´ ³f´ or [³i´][³f´] [³a´-´z´]([³a´-´z´. 6.

´ (<DIGIT>)* > 29 .JvaaCC spec for the tokens from Fig 2.´ (<DIGIT>)* | (<DIGIT>)+ ³. the token on the left will be returned */ TOKEN : { < IF: | < #DIGIT: |< ID: |< NUM: |< REAL: } ³if´ > [³0´-´9´] > [³a´-´z´] ( [³a´-´z´] | <DIGIT>)* > (<DIGIT>)+ > (<DIGIT>)+ ³.2 PARSER_BEGIN(MyParser) class MyParser{} PARSER_END(MyParser) /* For the regular expressin on the right.

JvaaCC spec for the tokens from Fig 2. javacc will throw an error */ /* main rule */ void start() : {} { (<IF> | <ID> |<NUM> |<REAL>)* } 30 .2 (continued) /* The regular expression here will be skipped during lexical analysis */ SKIP : { < ³ ³> | <³\t´> |<³\n´> } /* like SKIP but skipped text accessible from parser action */ SPECIAL_TOKEN : { <³--´ ([³a´-´z´])* (³\n´ | ³\r´ | ³\n\r´ ) > } /* . For any substring not matching lexical spec.

31 .

Grammar Specification with JavaCC 32 .

} Element(logger) <EndDoc> { print(token). } { <StartDoc> { print(token). } | else() } 33 .The Form of a Production java_return_type java_identifier ( java_parameter_list ) : java_block { expansion_choices } ‡ EX : void XMLDocument(Logger logger): { int msg = 0.

8 : P 34 . 6.Example ( Grammar 3. P L S id := id S while id do S S begin L end S if id then S S if id then S else S L S L L.30 ) 1. 8.S)* 1.7. 2. 4. 3. 7. 5.S S (.

30 PARSER_BEGIN(MyParser) pulic class MyParser{} PARSRE_END(MyParser) SKIP : {³ ³ | ³\t´ | ³\n´ } TOKEN: { <WHILE: ³while´> | <BEGIN: ³begin´> | <END:´end´> | <DO:´do´> | <IF:´if´> | <THEN : ³then´> | <ELSE:´else´> | <SEMI: ³.´> | <ASSIGN: ³=³> |<#LETTER: [³a´-´z´]> | <ID: <LETTER>(<LETTER> | [³0´-´9´] )* > } 35 .JavaCC Version of Grammar 3.

30 (cont¶d) void Prog() : { } { StmList() <EOF> } void StmList(): { } { Stm() (³.JavaCC Version of Grammar 3.´ Stm() ) * } void Stm(): { } { <ID> ³=³ <ID> | ³while´ <ID> ³do´ Stm() | <BEGIN> StmList() <END> | ³if´ <ID> ³then´ Stm() [ LOOKAHEAD(1) ³else´ Stm() ] } 36 .

3 are used to define grammar.Types of producitons ‡ production ::= javacode_production | regulr_expr_production | bnf_production | token_manager_decl Note: 1. 2 is used to define tokens 4 is used to embed codes into token manager. 37 .

38 .JAVACODE production ‡ javacode_production ::= ³JAVACODE´ java-return_type iava_id ³(³ java_param_list ³)´ java_block ‡ Note: ± Used to define nonterminals for recognizing sth that is hard to parse using normal production.

Example JAVACODE 39 .

Note: 40 .

41 .

42 . ‡ There can only be one token manager declaration in a JavaCC grammar file. ‡ These declarations and statements are written into the generated token manager (MyParserTokenManager.TOKEN_MANAGER_DECLS token_manager_decls ::= TOKEN_MGR_DECLS : java_block ‡ The token manager declarations starts with the reserved word "TOKEN_MGR_DECLS" followed by a ":" and then a set of Java declarations and statements (the Java block).java) and are accessible from within lexical actions.

43 .regular_expression_production regular_expr_production ::= [ lexical_state_list ] regexpr_kind [ [ IGNORE_CASE ] ] : { regexpr_spec ( | regexpr_spec )* } ‡ regexpr_kind::= TOKEN | SPECIAL_TOKEN | SKIP | MORE ‡ ‡ ‡ ‡ TOKEN is used to define normal tokens SKIP is used to define skipped tokens (not passed to later parser) MORE is used to define semi-tokens (I. only part of a token). SPECIAL_TOKEN is between TOKEN and SKIP tokens in that it is passed on to the parser and accessible to the parser action but is ignored by production rules (not counted as an token).e. Useful for representing comments.

‡ if omitted. java_identifier )* > ‡ The lexical state list describes the set of lexical states for which the corresponding regular expression production applies. 44 . Otherwise. it applies to all the lexical states in the identifier list within the angular brackets. then a DEFAULT lexical state is assumed.lexical_state_list lexical_state_list::= < * > | < java_identifier ( . the regular expression production applies to all lexical states. ‡ If this is written as "<*>".

then transition to that lexical state. 45 .regexpr_spec regexpr_spec::= regular_expression1 [ java_block ] [ : java_identifier ] ‡ Meaning: ‡ When a regular_expression1 is matched then ± if java_block exists then execute it ± if java_identifier appears.

regular_expression regular_expression ::= java_string_literal | < [ [#] java_identifier : ] complex_regular_expression_choices > | <java_identifier> | <EOF> ‡ <EOF> is matched by end-of-file character only. ‡ (1) for unnamed tokens 46 . ‡ (2) is used to defined a labled regular_expr and not visible to outside the current TOKEN section if # occurs. ± used in bnf_production ‡ java_string_literal is matched only by the string denoted by itself. ‡ (3) <java_identifier> is a reference to other labeled regular_expression.

"F"."F"."D"] > { // do Something } : LEX_ST1 | < #EXPONENT: ["e"." (["0"-"9"])* (<EXPONENT>)? (["f"."D"])? | (["0"-"9"])+ (<EXPONENT>)? ["f".Example <DEFAULT."-"])? (["0"-"9"])+ > } ‡ Note: if # is omitted. | 47 ."F"."d"."D"])? "."D"])? | (["0"-"9"])+ <EXPONENT> (["f"."d"."d"."F"."E"] (["+". LEX_ST2> TOKEN [IGNORE_CASE] : { < FLOATING_POINT_LITERAL: (["0"-"9"])+ ". E123 will be recognized erroneously as a token of kind EXPONENT."d"." (["0"-"9"])+ (<EXPONENT>)? (["f".

)[+|*|?] .juxtaposition choice.Structure of complex_regular_expression ‡ complex_regular_expression_choices::= complex_regular_expression (| complex_regular_expression )* ‡ complex_regular_expression ::= ( complex_regular_expression_unit )* ‡ complex_regular_expression_unit ::= java_string_literal | < java_identifier > | character_list | ( complex_regular_expression_choices ) [+|*|?] ‡ Note: unit concatenation. | complex_regular_expression complex_regular_expression_choice (. || unit ‡ Principle : . 48 .

character_descriptor )* ] ] character_descriptor::= java_string_literal [ .´b´]+ is not a regular_expression_unit.hexadecimal digit. ex: ± ~[³a´. ³0´-´9´.´b´] --.´F´] --.´E´.´b´] )+ instead. 49 .´C´. ± [³a´-´f´.all chars but a and b.´D´. ± [³a´.´B´.character_list character_list::= [~] [ [ character_descriptor ( . Why ? ‡ should be written ( [³a´.java_string_literal ] java_string_literal ::= // reference to java grammar ³ singleCharString* ³ note: java_sting_literal here is restricted to length 1. ³A´.

bnf_production ‡ bnf_production::= java_return_type java_identifier "(" java_parameter_list ")" ":" java_block "{" expansion_choices "}³ ‡ expansion_choices::= expansion ( "|" expansion )* ‡ expansion::= ( expansion_unit )* 50 .

expansion_unit ‡ expansion_unit::= local_lookahead | java_block | "(" expansion_choices ")" [ "+" | "*" | "?" ] | "[" expansion_choices "]" | [ java_assignment_lhs "=" ] regular_expression | [ java_assignment_lhs "=" ] java_identifier "(" java_expression_list ")³ Notes: 1 is for lookahead. 2 is for semantic action 4 = ( «)? 5 is for token match 6. is for match of other nonterminal 51 .

" ] [ expansion_choices ] [ ". Expr() <INT> | <REAL> .dev.net 52 ." ] [ "{" java_expression "}" ] ")³ ‡ Notes: ‡ 3 componets: max # lookahead + syntax + semantics ‡ examples: ± LOOKHEAD(3) ± LOOKAHEAD(5.java. { true} ) ‡ More on LOOKAHEAD ± see minitutorial on javacc.lookahead ‡ local_lookahead::= "LOOKAHEAD" "(" [ java_integer_literal ] [ ".

is generated in the parser class ‡ API for Parser Actions ‡ Token token. ± variable always holds the last token and can be used in parser actions.getToken(int i) and getNextToken() can also be used in actions to traverse the token list. 53 . ± two other methods .JavaCC API ‡ Non-Terminals in the Input Grammar ‡ NT is a nonterminal => returntype NT(parameters) throws ParseError. ± exactly the same as the token returned by getToken(0).

beginColumn. endLine. public Token next. public String image.Token class ‡ public int kind. public Token specialToken. endColumn. ± 0 for <EOF> ‡ ‡ ‡ ‡ ‡ ‡ ‡ public int beginLine. public String toString() { return image. } public static final Token newToken(int ofKind) 54 .

‡ two Exceptions not expected to be recovered ‡ Error reporting ± modify ParseExcpetion.java or TokenMgeError.Error reporting and recovery ‡ It is not user friendly to throw an exception and exit the parsing once encountering a syntax error.java method is always invokable in parser action to report error 55 .

‡ Shallow Error Recovery ‡ Deep Error Recovery ‡ Ex: ³ ³ ´ ´ 56 .

Shallow recovery can be recovered by additional choice: 57 .

‡ The approach: use java try-catch construct. void Stm() : {} { try { ( IfStm() | WhileStm() ) } catch (ParseException e) { error_skipto(SEMICOLON). 58 .‡ Same example: void Stm() : {} { IfStm() | WhileStm() } ‡ But this time the error occurs during paring inside IfStmt() or WhileStmt() instead of the lookahead entry. } } note: the new syntax for javacc bnf_production.

java.html ‡ 59 .dev.dev.net JavaCC documentation : https://javacc.java.References: ‡ javaCC web site : http://Javacc.net/doc/docindex.

Looking ahead in javacc 60 .

61 .What¶s LOOKAHEAD? ‡ What strings are matches by Input() ? ± abcc (yes) ± abc (no!!) void Input() : {} { "a" BC() "c" } void BC() : {} { "b" [ "c" ] } ‡ Why ? ± javaccµs default greedy lookahead alg.

Why matching abcc ? ‡ Input() :abcc ‡ ³a´ consume a :abcc ‡ BC() :bcc ‡ ³b´ consume b :bcc ‡ [³c´] greedily consume c : cc ‡ ³c´ consume c :c ‡ succeed! : 62 .

avoid Backtracking! 63 .Why abc not matched ? ‡ Input() :abc ‡ ³a´ consume a :abc ‡ BC() :bc ‡ ³b´ consume b :bc ‡ [³c´] greedily consume c :c ‡ even if no consumption seems better ‡ ³c´ need a µc¶ : don¶t match ‡ fail! ‡ Why such behavior ? ± 1 one symbol lookahead(for performance) ± 2.

How to math both input ? ‡ Rewrite the grammar! ‡ increase lookhead number 64 .

What about these rewritings ? void Input() : good! {} { "a" "b" "c" [ "c" ] } void Input() : {} { "a" "b" "c" "c" | "a" "b" "c" } void Input() : {} { "a" ( BC1() | BC2() ) } void BC1() : {} { "b" "c" "c" } void BC2() : {} { "b" "c" [ "c" ] } 65 .

± once making a decision. 2. ‡ The two ways in which you make the choice decisions work properly are: 1. Insert hints at the more complicated choice points to help the parser make the right choices.Looking Ahead ‡ Backtracking is unacceptable language parser ‡ LOOKAHEAD: ± The process of exploring tokens further in the input stream to determine decision at various choice points. 66 . Modify the grammar to make it simpler. it commits to it and there is no backtracking! ± Since some of these decisions may be made with less than perfect information you need to know something about LOOKAHEAD to make your grammar work correctly.

| expn ) ± which one to match ? ( « )? ± To match content inside () or bypass ? ( « )* ± To leave or match and then repeat ? ( « )+ = («)(«)* ± To leave or match and repeat after first match ? 67 ...Four Choice Points in javacc ‡ ‡ ‡ ‡ ( exp1 | exp2 | .

The Default Algo for choice | ‡ The default choice determination algorithm looks ahead 1 token in the input stream and uses this to help make its choice at choice points void basic_expr() : {} { <ID> "(" expr() ")" // Choice 1 | "(" expr() ")" // Choice 2 | "new" <ID> // Choice 3 } The choice determination algorithm : if (next token is <ID>) { choose Choice 1 } else if (next token is "(") { choose Choice 2 } else if (next token is "new") { choose Choice 3 } else { produce an error message } 68 .

column 3 and line 31." <ID> // Choice 4 } Warning: Choice conflict involving two expansions at line 25.A Modified Grammar void basic_expr() : {} { <ID> "(" expr() ")´ // Choice 1 | "(" expr() ")" | "new" <ID> | <ID> ". // Choice 3 // Choice 2 What happans on <ID>? Why? 69 . column 3 respectively. A common prefix is: <ID> Consider using a lookahead of 2 for earlier expansion.

go into the (...)* construct) consume the ". otherwise report error } 70 .)* ‡ Suppose the first <ID> has already been matched and that the parser has reached the choice point (the (." token if (next token is <ID>) consume it.)* construct).e. Here's how the choice determination algorithm works: while (next token is "..." <ID> )* } Note: the choice determination algorithm does not look beyond the (.") { choose the nested expansion (i...Greedy behavior for («)* void identifier_list() : {} { <ID> ( "..

.)* construct if the next token is a "." <INT> } 71 .." <ID> )*." is an <INT>. ± Intuitively..". it will always go into the (. the right thing to do in this situation is to skip the (." <ID> )* } void funny_list() : {} { identifier_list() ".Another Example ‡ When making a choice at ( ". ± It will do this even when identifier_list was called from funny_list and the token after the ".)* construct and return to funny_list void identifier_list() : {} { <ID> ( "..

‡Essentially..A Concrete input One input "id1. the parser will complain that it encountered a 5 when it was expecting an <ID>. id2.)* construct at line 25. The generated parser will still work . column 8. javacc would give the warning message: Warning: Choice conflict in (..³ Consider using a lookahead of 2 or more for nested expansion. one of which is: ". Expansion nested within construct and expansion following construct have common prefixes. ‡Note ± during parser generation. 5".except that it probably doesn¶t do what you expect 72 . JavaCC is saying it has detected a situation which may cause the parser to do strange things.

± Modify your grammar to make it LL(1) ± give more lookaheads globally or where needed. then the grammar is a LL(1) grammar.Multiple Token Lookaheads Specs ‡ the default algorithm works fine in most situations. ‡ There are two options for lookahead if your grammar is not LL(1). javacc provides you with warning messages. 73 . ‡ If you have javacc file without producing any warnings. In cases where it does not work well.

you can attempt to make your grammar LL(1) by making some changes to it ‡ But not always work! void basic_expr() : {} { Factor common left parts void basic_expr() :{ } { <ID> ( "(" expr() ")" | "." <ID> ) | "(" expr() ")" | // Choice 3 "new" <ID> } <ID> "(" expr() ")³ // Choice 1 | "(" expr() ")" | "new" <ID> | <ID> "." <ID> } // Choice 4 // Choice 2 74 . That is.Modify your grammar ‡ You can modify your grammar so that the warning messages go away.Option 1 .

} <ID> "(" expr() ")" | "(" expr() ")" | "new" <ID> | { initObjectTables(). 75 . left-factoring cannot be performed. } <ID> ".Factoring not always work!! void basic_expr() : {} { { initMethodTables()." <ID> } ‡ Since the actions are different.

. ‡ All such hints are specified using either ± setting the global LOOKAHEAD option to a larger value or ± using the LOOKAHEAD(. ± Option 2 give you a simpler grammar . ± Sometimes Option 2 is the only choice. 76 ..one that is easier to develop and maintain . ‡ Comparisons between the two options ± Option 1 makes your grammar perform better.Option 2 ± Increase lookadeads ‡ You can provide the generated parser with some hints to help it out in the non LL(1) situations.one that is more human friendy.) construct to provide a local hint on puzzled choice points.

« } ‡ local lookahead : void basic_expr() : {} { LOOKAHEAD(2) <ID> "(" expr() ")"// Choice 1 | "(" expr() ")" // Choice 2 | "new" <ID> // Choice 3 | <ID> "." <ID> // Choice 4 } if (next 2 tokens are <ID> and "(" ) { choose Choice 1 } else if (next token is "(") { choose Choice 2 } else if (next token is "new") { choose Choice 3 } else if (next token is <ID>) { choose Choice 4 } else { produce an error message } 77 .‡ Global Option LOOKAHEAD ± options { LOOKAHEAD=2.

." and <ID>) { choose the nested expansion (i..)* construct) consume the ".void identifier_list() : {} { <ID> ( LOOKAHEAD(2) "." token consume the <ID> token } 78 . go into the (.." <ID> )* } while (next 2 tokens are ".e.

Syntactic lookahead ‡ How many lookaheads are needed in the java type declaration ? void TypeDeclaration() : {} { // public static final class ClassDeclaration() | InterfaceDeclaration() // public abstract abstract interface } 79 .

‡ Maybe 100 is ok as well ! 80 .Solution 1 void TypeDeclaration() : {} { LOOKAHEAD(2147483647) ClassDeclaration() | InterfaceDeclaration() } ‡ Where 2147483647 is Integer.MAX_VALUE.

81 .Solution 2 ± syntactic lookahead void TypeDeclaration() : {} { LOOKAHEAD( ClassDeclaration() ) ClassDeclaration() | InterfaceDeclaration() } ‡ Lookahead of a complete ClassDeclaraation() takes too much time and makes a lot of unnecessary checking.

Solution 3 ± a better one
void TypeDeclaration() : {} { LOOKAHEAD( ( "abstract" | "final" | "public" )* "class" ) ClassDeclaration() | InterfaceDeclaration() }

82

Solution 4 ± syntactic lookahead + number bound
void TypeDeclaration() :{} { LOOKAHEAD( 10, ( "abstract" | "final" | "public" )* "class" ) ClassDeclaration() | InterfaceDeclaration() } ‡ Meaning: Look ahead at most 10 tokens, if not violating the pattern ( "abstract" | "final" | "public" )* "class" try ClassDeclaration(). ‡ default max numbers of tokens to be looked ahead is Integer.MAX_VALUE for syntactic lookahead.
83

Semantic lookahead
‡ Could we make the parser choose 2nd alternative on input ³a´ ³a´ without changing the order ? void Input() : {} { "a³ | ³a´ ³a´ } ‡ Syntactic lookahead impossible since it can¶t say things like that next toke is ³a´ and following token is not ³a´.

84

Solution: semantic lookahead
void Input() : {} { LOOKAHEAD( { getToken(1).kind == A && getToken(2).kind != A }) <A:"a³> | ³a´ ³a´ }

‡ syntactic + semantic
void Input() : {} { LOOKAHEAD(³a´, {getToken(2).kind != A }) <A:"a³> | ³a´ ³a´ }
85

} exapnsion followExpansion "amount³ expansion present ? 2147483647 : 0 Note: amount = 0. { boolean_expression } ) followExpansion ‡ At least one of the three entries must be present. ‡ The default values for each of these entities is defined below: ± ± ± ± { boolean_expr } { true.Complete LOOKAHEAD directive LOOKAHEAD( amount. 86 . no syntactic LOOKAHEAD is performed. expansion.

umbc.html ‡ http://userpages.java.net/doc/lookahead.ppt 87 .edu/~vick/431/Lectures/Spring06/ 4_Parsing/1_LL/Looking_ahead_in_javacc.References on javacc lookahead ‡ https://javacc.dev.

Sign up to vote on this title
UsefulNot useful