Grammar Variation in Compiler Design

Grammar Variation in
Compiler Design
Carl Wu
Three topics
• Syntax Grammar vs. AST
• Component(?)-based grammar
• Aspect-oriented grammar
Grammar vs. AST (I)
How to automatically generate a tr

ee from a grammar?
Grammar vs. AST (I)
Stmt ::= Block

| “if” Expr “then” Stmt
| IdUse “:=” Exp
Grammar vs. AST (I)
Stmt ::= Block

| “if” Exp “then” Stmt
| IdUse “:=” Exp
JastAdd Specification (Tree)

abstract Stmt;
BlockStmt : Stmt ::= Block;
IfStmt : Stmt ::= Exp Stmt;
AssignStmt : Stmt ::= IdUse Exp;
Grammar vs. AST (I)
Restricted CFG Definition

A ::= B C D √ => aggregation
A ::= B | C | D √ => inheritance
A ::= B C | D ×
Grammar vs. AST (I)
RCFG Specification
Stmt :: Block | IfStmt | AssignStmt
IfStmt :: “if” Exp “then” Stmt
AssignStmt :: IdUse “:=” Exp
Stmt
IfStmt Block AssignStmt
Exp Stmt IdUse Exp

Grammar vs. AST (II)
Parse tree vs. IR tree

• In an IDE, there are multiple visitors for the
same source code (>12 !).
• Different requirement for the tree structure:
– Syntax vs. semantics
– Immutable vs. transformable (optimization)
– Parse tree vs. IR tree
• Generate two tree structures from the sam
e grammar!
• One immutable, strong-typed, concrete pa
rse tree – Read only!
• One transferable, untyped, abstract IR tree
– Read and write!
IfStmt :: “if” Exp “then” Stmt
Class ASTNode{
protected ASTNode[] children;
}
class IfStmt extends ASTNode{
final protected Token token_if, Exp exp, Token token_then, Stmt stmt;
IfStmt(Token token_if, Exp exp, Token token_then, Stmt stmt){
// parse tree construction
this.token_if = token_if;
this.exp = exp;
this.token_then = token_then;
this.stmt = stmt;
// IR tree construction
children[0] = exp;
children[1] = stmt;
}
}
Component(?)-based gramma
r
Component vs. module
• What is the different between a
component and a module?
• What is a modularized grammar?
• What is an ideal component-based
grammar?
Component vs. module
Grammar
Grammar Grammar
Grammar Grammar
Grammar Grammar
Grammar
Module
Module Module
Module Component
Component Component
Component
Grammar
Grammar Parser
Parser
Parser
Parser
Parser
Parser
Modularized Component-based
grammar grammar
Benefits
• Benefits from modularized grammar
– Easy to read, write, change
– Eliminate naming conflicts
• Additional benefits brought from component-
based grammar
– Each component can be designed, developed and
tested individually.
– Any change to certain component does not require
compiling all the other components.
– Different type of grammars/parsing algorithms can be
used for different component, e.g., one component
can be LL, one can be LALR.
Difficulty in designing component-
based grammar
• No clear guards between two components.
– Switch the control to a new parser or stay in the same?
– Suitable for embed languages, e.g., Jscript in Html
– Not suitable for an integral language, e.g., Java
• Two much coupling between two components.
– Not just reuse the component as a whole, may also
reuse the internal productions and symbols.
– Not applicable for LR parsers, once the table is built,
you can’t reuse the internal productions (no way to
jump into a table).
Ideal vs. reality
Object_type Array Object_type Array
Interface
Interface
Type
Type
Class
Class
Java
Java
Statement
Statement
Expression
Expression
Binary_expr
Binary_expr
Unary_expr
Unary_expr
Primary
Primary
Suggestions?
Aspect-oriented grammar
Aspect-oriented grammar
• Join-point: grammar patterns that crosscut
multiple productions
• Punctuations, identifiers, modifiers…
Example
• ";“ appears 25 times in one of the Java
grammars
• “.” appears 74 times in one of the Cobol
grammars
• Every one of them should be carefully
placed!
<Sentence> ::= <Accept Stm> '.' | <Open Stm> '.'
| <Add Stm> '.' | <Perform Stm> '.'
| <Add Stm Ex> <End-Add Opt> '.' | <Perform Stm Ex> <End-Perform Opt>
'.'
| <Call Stm> '.'
| <Read Stm> '.'
| <Call Stm Ex> <End-Call Opt> '.'
| <Read Stm Ex> <End-Read Opt> '.'
| <Close Stm> '.'
| <Release Stm> '.'
| <Compute Stm> '.'
| <Rewrite Stm> '.'
| <Compute Stm Ex> <End-Compute Opt>
'.' | <Rewrite Stm Ex> <End-Rewrite Opt> '.'
| <Display Stm> '.' | <Set Stm> '.'
| <Divide Stm> '.' | <Start Stm> '.'
| <Divide Stm Ex> <End-Divide Opt> '.' | <Start Stm Ex> <End-Start Opt> '.'
| <Evaluate Stm> <End-Evaluate Opt> '.' | <String Stm> '.'
| <If Stm> <End-If Opt>'.' | <String Stm Ex> <End-String Opt> '.'
| <Move Stm> '.' | <Subtract Stm>'.'
| <Move Stm Ex> <End-Move Opt> '.' | <Subtract Stm Ex> <End-Substract Opt>
'.'
| <Multiply Stm>'.'
| <Write Stm> '.'
| <Multiply Stm Ex> <End-Multiply Opt> '.'
| <Write Stm Ex> <End-Write Opt> '.'
| <Unstring Stm>'.'
| <Unstring Stm Ex> <End-Unstring Opt> '.'
| <Misc Stm> '.'
pointcut PreDot(): <Sentence>;

after PreDot(): ‘.'
Another example
pointcut Content(): … …
before Content(): “(”;
after Content(): “)”;
Guarantee they match!

Grammar weaving
Base
Base Grammar
Grammar
Grammar
Grammar Aspect
Aspect
Result
Result
grammar
grammar
Parser
Parser
What do you think?

Grammar Variation in Compiler Design

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Grammar Variation in Compiler Design

Uploaded by

Copyright:

Available Formats

Grammar Variation in

How to automatically generate a tr

Stmt ::= Block

Stmt ::= Block

JastAdd Specification (Tree)

Restricted CFG Definition

IfStmt Block AssignStmt

Exp Stmt IdUse Exp

Parse tree vs. IR tree

pointcut PreDot(): <Sentence>;

Guarantee they match!

You might also like