You are on page 1of 10

Working with Sablecc

Prasad.A
About
This document is about getting prepared to work with Sablecc. Please refer to Sablecc
(http://www.sablecc.org) documentation to know more about it.
I am documenting the steps I followed to get my first application to work using Sablecc.
I wanted to develop a simple INI (Initialiation) file Parser with Sablecc.
Setting up the project:
I created the following directory structure!
INIParser
"## src
"## classes
"## grammar
"## test
$reated the ant build file (build.%ml) under INIParser directory.
<?xml version="1.0"?>
<project name="INIParser" default="dist" basedir=".">
<propert name="src.dir" value="src"!>
<propert name="classes.dir" value="classes"!>
<propert name=""rammar.dir" value=""rammar"!>
<propert name="lib.jar" value="iniparser.jar"!>
<#$$ %dd sablecc$anttas&.jar to ant's lib director (better name it as ant$
sablecc.jar) $$>
<#$$ *omment t+e line belo, if ou +ave defined t+e ant tas& inside t+e I-.. $$>
<tas&def name="sablecc" classname="or".sablecc.ant.tas&def./ablecc"><!tas&def>
<tar"et name="init">
<m&dir dir="01classes.dir2"!>
<!tar"et>
<tar"et name="clean">
<delete dir="01classes.dir2"!>
<delete file="01lib.jar2"!>
<!tar"et>
<tar"et name="precompile">
<#$$ 3et us put t+e output files to t+e src director4 its "ood to use pac&a"e
name inside t+e "rammar file. $$>
<#$$ 3ets clean t+e "rammar source files t+at ,ere "enerated. $$>
<ec+o messa"e="-eletin" t+e source director5 01src.dir2!"rammar!"><!ec+o>
<delete dir="01src.dir2!"rammar!" !>
<sablecc src="01"rammar.dir2" includes="6.sablecc4 6."rammar"
outputdirector="01src.dir2"><!sablecc>
<cop todir="01classes.dir2">
<fileset dir="01src.dir2">
<include name="66!6.dat"!>
<include name="66!6.txt"!>
<!fileset>
<!cop>
<!tar"et>
<tar"et name="compile" depends="init4 precompile">
<javac srcdir="01src.dir2" destdir="01classes.dir2" debu"="true"!>
<!tar"et>
<tar"et name="dist" depends="compile">
<jar basedir="01classes.dir2" destfile="01lib.jar2"!>
<!tar"et>
<!project>
Sablecc comes with ant task support. &sually its called' sablecc(anttask.)ar' copy it to your
ant*s lib directory' name it as ant(sablecc.)ar (a convention).
If you look the build file' I have defined a anttask!
<tas&def name="sablecc" classname="or".sablecc.ant.tas&def./ablecc"><!tas&def>
In the precompile target we are deleting the old files in the src/grammar directory. +e are
then using sablecc task to convert our grammar files (ending with sablecc or grammar) to
)ava source in the directory src/grammar. ,Note! So in the grammar file we have to atleast
set the Package to grammar.-
<sablecc src="01"rammar.dir2" includes="6.sablecc4 6."rammar"
outputdirector="01src.dir2"><!sablecc>
Sablecc generates some dat files for le%er' parser etc. which are used at runtime. So we
need to copy them to the classes directory.
<cop todir="01classes.dir2">
<fileset dir="01src.dir2">
<include name="66!6.dat"!>
<include name="66!6.txt"!>
<!fileset>
<!cop>
INI File structure:
[SectionName]
keyname=value
comment
keyname=value! value! value comment
To keep it simple let us handle:
[SectionName]
keyname=value value cannot be blank.
keyname and value should be alpha numeric starting with alphabet.
First Version of Sablecc rammar:
Pac&a"e "rammar.ini7
8elpers
all = 9 0 .. 0x::::;7
di"it = 9'0' .. '<';7
lo,ercase = 9'a' .. '=';7
uppercase = 9'%' .. '>';7
alp+a = 9lo,ercase ? uppercase;7
underscore = '@'7
das+ = '$'7
extalp+a = 99underscore ? das+; ? alp+a;7
semicolon = '7'7
tab = <7
lf = 107
cr = 1A7
eol = cr lf B cr B lf7
not@cr@lf = 9all $ 9cr ? lf;;7
Co&ens
lbrac&et = '9'7
rbrac&et = ';'7
eDual = '='7
comma = '4'7

blan& = (' ' B tab B lf B cr)?7
name = (extalp+a (extalp+a B di"it)6)7
comment = semicolon not@cr@lf6 eol7
I"nored Co&ens
blan&4
comment7
Productions
inidoc = sections 7
sections = section6 7
section = lbrac&et name rbrac&et sectiondata67
sectiondata = 9&e;5name eDual 9value;5name7
#ook at the grammar $ile:
+e are defining the package name for the )ava source! grammar.ini
The source files gets created under grammar.ini directory.
!elpers section defines few tokens that can be reused. No )ava class is associated with
them.
Tokens section defines the actual tokens that are used in the Productions. /ach Token
defined has associated )ava class prefi%ed with *T* and first alphabet being capitalied.
The generated token classes are (in grammar.ini.node package)!
%#bracket! %&bracket! %'(ual! %)omma! %*lank! %Name! %)omment.
Ignored Tokens section declares the tokens that should be ignored by the parser.
"roductions section defines the productions for the grammar. The production rule classes
created are (in grammar.ini.node package)!
Pinidoc! PSections! PSection! PSectiondata.
$lasses are also created for each production rule (in grammar.ini.node package) that
contains the group of production rule!
A+nidoc! ASections! ASection! ASectiondata. These are the classes which is of concern when
we analye the parse tree.
,N0T/!
1. In case there are more than on production rule then each alternative should be
named.
/.g. names = ,multiple- name comma names . ,single- name
2. Since same token *name* is used for key and value' they are (should) be named'
,key-! and ,value-! .3ou can use any name you want. It becomes 4uite handy when
doing the analysis of the tree.-
Sablecc automatically generates the default tree visitor (or analyer). +e shall write our
own version' which does the group of (key'value) pairs of each section and groups all the
section in one map.
#ur Tree Transformer$Anal%&er:
!6
6 *reated on Ear F4 G00H
6!
pac&a"e transform7
import "rammar.ini.analsis.-ept+:irst%dapter7
import "rammar.ini.node.%/ection7
import "rammar.ini.node.%/ectiondata7
import java.util.%rra3ist7
import java.util.8as+Eap7
import java.util.3ist7
import java.util.Eap7
!66
6 Iaut+or prasad
6!
public class INICransform extends -ept+:irst%dapter 1
private Eap inidoc = ne, 8as+Eap()7
private /trin" secname = "JNN%E.-"7
public Eap "etKesult() 1 return inidoc7 2

private void addCo/ection(/trin" secname4 /trin" &e4 /trin" value) 1
Eap +map = (Eap) inidoc."et(secname)7
if(inidoc."et(secname) == null) 1
+map = ne, 8as+Eap()7
2
+map.put(&e4 value)7
inidoc.put(secname4 +map)7
2
public void in%/ection(%/ection node) 1
secname = node."etName()."etCext().trim()7
2
public void out%/ectiondata(%/ectiondata node) 1
/trin" &e = node."etLe()."etCext().trim()7
/trin" value = node."etMalue()."etCext().trim()7
addCo/ection(secname4 &e4 value)7
2
2
inA///! caseA/// and outA/// functions are called be$ore! during and a$ter the production
analysis respectively. In our case' we are setting the section name being analyed when
section 5 ...6 production is taken for analysis and the values list is cleared up.
+hen sectiondata 5 ...6 production analysis is completed' the key and value is added to the
section which is being analyed.
The above process would be simplified' if we could created the 78bstract Synta% Tree7
ourself instead of traversing.analying the 7$oncrete Synta% Tree7 generated by Sablecc.
The feature of converting the 7concrete Synta% Tree7 ($ST) to 78bstract Synta% Tree7 (8ST)
is available from the Sablecc version 9.:(beta9. Please note' Sablecc version 9.: final' uses
generics which is part of ;ava1.<. I*m using version 9.:(beta9 which supports $ST(to(8ST
conversion and uses ;ava1.= synta%.
#ur parser's main code:
!6
6 *reated on Ear F4 G00H
6
6!
pac&a"e parser7
import "rammar.ini.lexer.3exer7
import "rammar.ini.lexer.3exer.xception7
import "rammar.ini.node./tart7
import "rammar.ini.parser.Parser7
import "rammar.ini.parser.Parser.xception7
import java.io.:ile7
import java.io.:ileKeader7
import java.io.IN.xception7
import java.io.Pus+bac&Keader7
import transform.INICransform7
!66
6 Iaut+or prasad
6
6!
public class INIParser 1
public static void main(/trin"9; ar"s)
t+ro,s Parser.xception4 3exer.xception4 IN.xception 1
:ile file = ne, :ile("test"4 "sample.ini")7
Pus+bac&Keader pus+bac&Keader = ne, Pus+bac&Keader(ne, :ileKeader(file))7
Parser parser = ne, Parser(ne, 3exer(pus+bac&Keader))7
/tart tree = parser.parse()7
/stem.out.println("*/CCree.........On" ? tree.to/trin"())7
INICransform transformer = ne, INICransform()7
tree.appl(transformer)7
/stem.out.println("Cransform Kesult.........On" ?
transformer."etKesult())7
2
2
#ur Input File (sample)ini in director% test*:
7 Mariable definition for pro"ram
9Plobals;
device = computer
93ocals;
input0 = mouse
input1 = &eboard
9@Internal;
+ard,are=iAQH
9interrupt;
+ardintr=aF
#utput of the parser:
*/CCree.........
9 Plobals ; device = computer 9 3ocals ; input0 = mouse input1 = &eboard
9 @Internal ; +ard,are = iAQH 9 interrupt ; +ardintr = aF
Cransform Kesult.........
1interrupt=1+ardintr=aF24 @Internal=1+ard,are=iAQH24 Plobals=1device=computer24
3ocals=1input1=&eboard4 input0=mouse22
Second Version of Sablecc grammar:
In this version' we are going to improve the previous grammar and try to build our 8bstract
Synta% Tree which helps us in analying the parse tree.
Pac&a"e "rammar.iniast7
8elpers
all = 9 0 .. 0x::::;7
di"it = 9'0' .. '<';7
lo,ercase = 9'a' .. '=';7
uppercase = 9'%' .. '>';7
alp+a = 9lo,ercase ? uppercase;7
underscore = '@'7
das+ = '$'7
extalp+a = 99underscore ? das+; ? alp+a;7
semicolon = '7'7
tab = <7
lf = 107
cr = 1A7
eol = cr lf B cr B lf7
not@cr@lf = 9all $ 9cr ? lf;;7
Co&ens
lbrac&et = '9'7
rbrac&et = ';'7
eDual = '='7

blan& = (' ' B tab B lf B cr)?7
name = (extalp+a (extalp+a B di"it)6)7
comment = semicolon not@cr@lf6 eol7
I"nored Co&ens
blan&4
comment7
Productions
inidoc { -> inidoc_ast } = sections
{ -> New inidoc_ast([sections.ast_section]) };
sections { -> ast_section* } = section6
{ -> [section.ast_section] };
section { -> ast_section } = lbrac&et name rbrac&et sectiondata6
{-> New ast_section(name,[sectiondata.ast_data])};
sectiondata { -> ast_data } = 9&e;5name eDual 9value;5name
{ -> New ast_data(key, value) };
!st"act #ynta$ %"ee
inidoc_ast & ast_section*;
ast_section & name ast_data*;
ast_data & [key]'name [value]'name;
#ook at the grammar $ile:
+e are defining the package name for the )ava source! grammar.iniast
The source files gets created under grammar.iniast directory.
8 new section' Abstract S%nta+ Tree' is added. This section contains the rule that defines
how the parse tree should look like. /ach production now is decorated with few rules like!
production0name ,12 nodetype- = production0rule ,12 New nodetype3parameters4 -
/.g. inidoc , -. inidoc/ast 0 1 sections , -. New inidoc/ast(2sections)ast/section3* 04
creates node of type inidoc#ast ()ava class' 8Inidoc8st) with the list of ast#section ()ava
class' 88stSection) as parameters. This corresponds to the 8ST rule!
inidoc/ast 1 ast/section54
,N0T/! ,production#rule#node.resultNode- is used to create list of result nodes. >or
creating single node of different type New is used. Please refer to Sablecc*s documentation-
Notice the way in which 8ST rules and decorations in the Productions section is used.
sections , 12 ast0sections5 - = section5 , 12 [section.ast0section] -
makes sure' the resulting ast node is list of resultant node of section production which is
nothing but!
section , 12 ast0section - = lbracket name rbracket sectiondata5
, 12 New ast0section3name! [sectiondata.ast0data] -
The resulting ast#section node has parameter the section name and list of section data.
sectiondata , 12 ast0data - = [key]:name e(ual [value]:name ,12 New ast0data3key!value4 -
/ach sectiondata production results in ast#data node which contains the key and value
name tokens.
The number of parameters sent during node creation 7New nodetype(parameters)7 is
checked against number of parameters and type defined in 8ST rule. In case of mismatch'
parser production fails.
inidoc0ast = ast0section5
This 8ST rule e%pects one parameter' which is a list of ast#section node.
ast0section = name ast0data5
This 8ST rule e%pects two parameters' which is a name token and list of ast#data.
ast0data = [key]:name [value]:name
This 8ST rule e%pects two parameters' which are of name token type.
3ou can see the simplified 8ST in the above grammar rule. The 8ST definitions makes the
node clean by eliminating the tokens like e4ual' lbracket' rbracket not entering the node.
This greatly reduces the effort during parse tree analysis.
#ur Tree Transformer$Anal%&er:
!6
6 *reated on Ear F4 G00H
6!
pac&a"e transform7
import "rammar.iniast.analsis.-ept+:irst%dapter7
import "rammar.iniast.node.%%st-ata7
import "rammar.iniast.node.%%st/ection7
import "rammar.iniast.node.%Inidoc%st7
import java.util.8as+Eap7
import java.util.3ist7
import java.util.Eap7
!66
6 Iaut+or prasad
6!
public class INI%/CCransform extends -ept+:irst%dapter 1
private Eap inidoc = ne, 8as+Eap()7

public Eap "etKesult() 1 return inidoc7 2

private void addCo/ection(/trin" secname4 /trin" &e4 /trin" value) 1
Eap +map = (Eap) inidoc."et(secname)7
if(inidoc."et(secname) == null) 1
+map = ne, 8as+Eap()7
2
+map.put(&e4 value)7
inidoc.put(secname4 +map)7
2

public void out%Inidoc%st(%Inidoc%st node) 1
3ist sections = node."et%st/ection()7
for(int index = 07 index < sections.si=e()7 ??index) 1
%%st/ection section = (%%st/ection) sections."et(index)7

/trin" secname = section."etName()."etCext()7

3ist datalist = section."et%st-ata()7
for(int didx = 07 didx < datalist.si=e()7 ??didx) 1
%%st-ata astdata = (%%st-ata) datalist."et(didx)7
addCo/ection(secname4
astdata."etLe()."etCext()4
astdata."etMalue()."etCext())7
2
2
2
2
8s you can see' the 8ST transformation rules has simplified the task of analying the tree to
a acceptable e%tent (at least easy to understand). +e are handling only one case
out8Inidoc8st() which gets called after the root 8ST definition is processed. The primary
advantage of having our own 8ST is even if production rules changes' the transformation
code need not change if same tree is generated using the same 8ST rules. 0nly thing that
should be taken care is creating right type of 8ST nodes due to change in production rules.
#ur parser's main code:
!6
6 *reated on Ear F4 G00H
6
6!
pac&a"e parser7
import "rammar.iniast.lexer.3exer7
import "rammar.iniast.lexer.3exer.xception7
import "rammar.iniast.node./tart7
import "rammar.iniast.parser.Parser7
import "rammar.iniast.parser.Parser.xception7
import java.io.:ile7
import java.io.:ileKeader7
import java.io.IN.xception7
import java.io.Pus+bac&Keader7
import transform.INI%/CCransform7
!66
6 Iaut+or prasad
6
6!
public class INI%/CParser 1
public static void main(/trin"9; ar"s)
t+ro,s Parser.xception4 3exer.xception4 IN.xception 1
:ile file = ne, :ile("test"4 "sample.ini")7
Pus+bac&Keader pus+bac&Keader = ne, Pus+bac&Keader(ne, :ileKeader(file))7
Parser parser = ne, Parser(ne, 3exer(pus+bac&Keader))7
/tart tree = parser.parse()7
/stem.out.println("Nur %/CCree.........On" ? tree.to/trin"())7
INI%/CCransform transformer = ne, INI%/CCransform()7
tree.appl(transformer)7O
/stem.out.println("Cransform Kesult.........On" ?
transformer."etKesult())7
2
2
With the same input INI file (as the pre6ious one*7 #utput result is:
Nur %/CCree.........
Plobals device computer 3ocals input0 mouse input1 &eboard @Internal +ard,are iAQH
interrupt +ardintr aF
Cransform Kesult.........
1interrupt=1+ardintr=aF24 @Internal=1+ard,are=iAQH24 Plobals=1device=computer24
3ocals=1input1=&eboard4 input0=mouse22
3ou can see the difference in the Tree (toString() version)' even though the Transform
result is the same.
CSTTree.........
[ Globals ] device = computer [ Locals ] input0 = mouse input1 = keyboard
[ _nternal ] !ard"are = i#$% [ interrupt ] !ardintr = a&
'ur (STTree.........
Globals device computer Locals input0 mouse input1 keyboard _nternal !ard"are
i#$%
I hope this documentation helped you to get started with Sablecc. >eedback is appreciated.
/mail(id! prasad?9.a@gmail.com
Thank you.

You might also like