You are on page 1of 25

Grammar Variation in

Compiler Design
Carl Wu
Three topics
• Syntax Grammar vs. AST
• Component(?)-based grammar
• Aspect-oriented grammar
Grammar vs. AST (I)

How to automatically generate a tr


ee from a grammar?
Grammar vs. AST (I)

Stmt ::= Block


| “if” Expr “then” Stmt
| IdUse “:=” Exp
Grammar vs. AST (I)

Stmt ::= Block


| “if” Exp “then” Stmt
| IdUse “:=” Exp

JastAdd Specification (Tree)


abstract Stmt;
BlockStmt : Stmt ::= Block;
IfStmt : Stmt ::= Exp Stmt;
AssignStmt : Stmt ::= IdUse Exp;
Grammar vs. AST (I)

Restricted CFG Definition


A ::= B C D √ => aggregation
A ::= B | C | D √ => inheritance
A ::= B C | D ×
Grammar vs. AST (I)

RCFG Specification
Stmt :: Block | IfStmt | AssignStmt
IfStmt :: “if” Exp “then” Stmt
AssignStmt :: IdUse “:=” Exp

Stmt

IfStmt Block AssignStmt

Exp Stmt IdUse Exp


Grammar vs. AST (II)

Parse tree vs. IR tree


Grammar vs. AST (II)
• In an IDE, there are multiple visitors for the
same source code (>12 !).
• Different requirement for the tree structure:
– Syntax vs. semantics
– Immutable vs. transformable (optimization)
– Parse tree vs. IR tree
Grammar vs. AST (II)
• Generate two tree structures from the sam
e grammar!
• One immutable, strong-typed, concrete pa
rse tree – Read only!
• One transferable, untyped, abstract IR tree
– Read and write!
Grammar vs. AST (II)
IfStmt :: “if” Exp “then” Stmt
Class ASTNode{
protected ASTNode[] children;
}
class IfStmt extends ASTNode{
final protected Token token_if, Exp exp, Token token_then, Stmt stmt;
IfStmt(Token token_if, Exp exp, Token token_then, Stmt stmt){
// parse tree construction
this.token_if = token_if;
this.exp = exp;
this.token_then = token_then;
this.stmt = stmt;
// IR tree construction
children[0] = exp;
children[1] = stmt;
}
}
Component(?)-based gramma
r
Component vs. module
• What is the different between a
component and a module?
• What is a modularized grammar?
• What is an ideal component-based
grammar?
Component vs. module

Grammar
Grammar Grammar
Grammar Grammar
Grammar Grammar
Grammar
Module
Module Module
Module Component
Component Component
Component

Grammar
Grammar Parser
Parser

Parser
Parser
Parser
Parser
Modularized Component-based
grammar grammar
Benefits
• Benefits from modularized grammar
– Easy to read, write, change
– Eliminate naming conflicts
• Additional benefits brought from component-
based grammar
– Each component can be designed, developed and
tested individually.
– Any change to certain component does not require
compiling all the other components.
– Different type of grammars/parsing algorithms can be
used for different component, e.g., one component
can be LL, one can be LALR.
Difficulty in designing component-
based grammar
• No clear guards between two components.
– Switch the control to a new parser or stay in the same?
– Suitable for embed languages, e.g., Jscript in Html
– Not suitable for an integral language, e.g., Java
• Two much coupling between two components.
– Not just reuse the component as a whole, may also
reuse the internal productions and symbols.
– Not applicable for LR parsers, once the table is built,
you can’t reuse the internal productions (no way to
jump into a table).
Ideal vs. reality
Object_type Array Object_type Array

Interface
Interface

Type
Type
Class
Class
Java
Java

Statement
Statement
Expression
Expression

Binary_expr
Binary_expr
Unary_expr
Unary_expr

Primary
Primary
Suggestions?
Aspect-oriented grammar
Aspect-oriented grammar
• Join-point: grammar patterns that crosscut
multiple productions
• Punctuations, identifiers, modifiers…
Example
• ";“ appears 25 times in one of the Java
grammars
• “.” appears 74 times in one of the Cobol
grammars
• Every one of them should be carefully
placed!
<Sentence> ::= <Accept Stm> '.' | <Open Stm> '.'
| <Add Stm> '.' | <Perform Stm> '.'
| <Add Stm Ex> <End-Add Opt> '.' | <Perform Stm Ex> <End-Perform Opt>
'.'
| <Call Stm> '.'
| <Read Stm> '.'
| <Call Stm Ex> <End-Call Opt> '.'
| <Read Stm Ex> <End-Read Opt> '.'
| <Close Stm> '.'
| <Release Stm> '.'
| <Compute Stm> '.'
| <Rewrite Stm> '.'
| <Compute Stm Ex> <End-Compute Opt>
'.' | <Rewrite Stm Ex> <End-Rewrite Opt> '.'
| <Display Stm> '.' | <Set Stm> '.'
| <Divide Stm> '.' | <Start Stm> '.'
| <Divide Stm Ex> <End-Divide Opt> '.' | <Start Stm Ex> <End-Start Opt> '.'
| <Evaluate Stm> <End-Evaluate Opt> '.' | <String Stm> '.'
| <If Stm> <End-If Opt>'.' | <String Stm Ex> <End-String Opt> '.'
| <Move Stm> '.' | <Subtract Stm>'.'
| <Move Stm Ex> <End-Move Opt> '.' | <Subtract Stm Ex> <End-Substract Opt>
'.'
| <Multiply Stm>'.'
| <Write Stm> '.'
| <Multiply Stm Ex> <End-Multiply Opt> '.'
| <Write Stm Ex> <End-Write Opt> '.'
| <Unstring Stm>'.'
| <Unstring Stm Ex> <End-Unstring Opt> '.'
| <Misc Stm> '.'

pointcut PreDot(): <Sentence>;


after PreDot(): ‘.'
Another example
pointcut Content(): … …
before Content(): “(”;
after Content(): “)”;

Guarantee they match!


Grammar weaving
Base
Base Grammar
Grammar
Grammar
Grammar Aspect
Aspect

Result
Result
grammar
grammar

Parser
Parser
What do you think?

You might also like