You are on page 1of 21

Use JavaCC to build a user friendly boolean query

language
Now you're talking my language!

JoAnn Brereton January 15, 2004

JavaCC is a very powerful "compiler compiler" tool that can be used to formulate context-free
grammars. This article demonstrates how JavaCC can be used to allow end users to formulate
simple boolean queries against a DB2 UDB database.

Introduction to JavaCC
Many Web-based projects contain ad-hoc query systems that allow end users to search for
information. For this, some sort of language is needed for the end users to convey what it is
they wish to search for. Sometimes, a user query language is defined quite simply. If your end
user is content to have a language as simple as your most typical Google search, then Java's
StringTokenizer is more than up to the task for parsing. However, if a customer wishes to have a
more robust language, perhaps adding in parentheses and "AND"/"OR" logic, then we quickly find
ourselves in need of a bigger hammer. We need a means to first define the language that the user
will be using and then use that definition to parse the user's entries and formulate the correct query
to whatever back-end database we have.

That's where a tool like JavaCC comes in. JavaCC stands for "Java® Compiler Compiler", a name
that stands in homage to YACC, "Yet Another Compiler Compiler", the C-based tool developed
by AT&T for the purposes of building parsers for C and other high-level languages. YACC and
its cohort lexical tokenizer, "Lex", accepted as input a definition of a language in a common form
known as Backus-Naur form (BNF, aka Backus Normal Form) and emitted a "C" program that
could parse input in that language and perform functions on it. JavaCC, like YACC, is designed
to speed the process of developing parser logic for languages. Whereas YACC emitted C code,
however, JavaCC, as you might expect, emits Java code.

JavaCC's history is storied. It began its life in Sun, under the name "Jack". Jack later became
JavaCC which passed through several owners, notably Metamata and WebGain, before coming
back to Sun, who then released it as Open Source code under the BSD license.

JavaCC's strength is in its simplicity and extensibility. To compile the Java code generated by
JavaCC, no outside JAR files or directories are required. Simply compile with the base 1.2 Java

© Copyright IBM Corporation 2004 Trademarks


Use JavaCC to build a user friendly boolean query language Page 1 of 21
developerWorks® ibm.com/developerWorks/

compiler. The layout of the language also makes it easy to add production rules and behaviors.
The Web site even describes how you can tailor exceptions to give your user targetted syntax
hints.

The Problem Definition


Let us suppose that you have a customer at a video rental store that has a simple database of
movies. The database consists of tables for MOVIES, ACTORS, and KEYWORDS. The MOVIES
table lists the pertinent data for each movie in his shop, namely things like the title and director for
each movie. The ACTORS table lists the names of actors in each movie. The KEYWORDS table
lists words describing the movie, such as words like "action", "drama", "adventure", and so on.

The customer would like to be able to issue somewhat sophisticated queries against this database.
He would, for example, like to enter a query of the form

actor = "Christopher Reeve" and keyword=action and keyword=adventure

and have it return the Superman movies featuring Christopher Reeve. He would also like to use
parentheses to indicate order of evaluation to distinguish queries like

(actor = "Christopher Reeve" and keyword=action) or keyword=romance

which might return movies that do not feature Christopher Reeve from

actor = "Christopher Reeve" and (keyword=action or keyword=romance)

which should always return movies starring Christopher Reeve.

The Solution
For this exercise, you will define the solution in two stages. In the stage 1, you will define the
language in JavaCC, ensuring that end-user queries are parsed correctly. In stage 2, you will add
behavior to JavaCC code in order to produce DB2® SQL code that will ensure that the correct
movies are returned in answer to the end-user queries.

Stage 1 - Defining the User's Query Language


The language will be defined inside of a file named UQLParser.jj. This file will be "compiled" by the
JavaCC tool into a set of Java class files of type .java. To define a language in a JJ file, you need
to do five things:

1. Define the context of parsing


2. Define "white space"
3. Define "tokens"
4. Define the syntax of the language itself in terms of the tokens
5. Define the behavior that will happen at each stage of parsing
You can define your own UQLParser.jj file using the segments presented, or you can follow along
using the code associated with this paper. Use the copy of UQLParser.jj in JavaCCPaper/stage1/

Use JavaCC to build a user friendly boolean query language Page 2 of 21


ibm.com/developerWorks/ developerWorks®

src for steps 1 through 4. Step 5 is covered in JavaCCPaper/stage2/src. DDL for the sample
database can be found in JavaCCPaper/moviedb.sql. The example works best if the database is
created and the parser run using the same userid. Ant files (build.xml) are provided to speed to
compilation process.

Step 1. Define the context of parsing

JavaCC .jj files are translated by the JavaCC executable into .java files. JavaCC takes the portion
of the .jj file that lives between PARSER_BEGIN and ends with PARSER_END and copies it
directly into the resulting java file. This is the place where you, the parser designer, can place all of
your pre- and post-parsing activity related to your parser. This is also the place where you can link
your java code to the parser actions you will define in steps 4 and 5.

In the example shown below, the parser is relatively simple. The constructor UQLParser takes
as input a String, reading it in through Java's java.io.StringReader class, and then calls another,
unseen constructor with the StringReader cast to Reader. The only other method defined here
is the static main method which calls the constructor and then calls an as-yet undefined method
called parse().

As you might have already guessed, JavaCC already provides a constructor from Java's Reader
class. We have added the String-based constructor for ease of use and testing.

Listing 1. The parser's Java context


PARSER_BEGIN(UQLParser)

package com.demo.stage1;

import java.io.StringReader;
import java.io.Reader;

public class UQLParser {

/**
A String based constructor for ease of use.
**/
public UQLParser(String s)
{
this((Reader)(new StringReader(s)));

public static void main(String args[])


{
try
{
String query = args[0];
UQLParser parser = new UQLParser(query);
parser.parse();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
PARSER_END(UQLParser)

Use JavaCC to build a user friendly boolean query language Page 3 of 21


developerWorks® ibm.com/developerWorks/

Step 2. Define white space

In this language, you wish to treat as delimiters but otherwise ignore spaces, tabs, carriage
returns and newlines. These characters are known as white space. In JavaCC, we define these
characters in the SKIP section, as indicated in Listing 2.

Listing 2. Defining white space in the SKIP section


/** Skip these characters, they are considered "white space" **/
SKIP :
{
" "
| "\t"
| "\r"
| "\n"

Step 3. Define the tokens

Next, you define what tokens you will recognize in the language. A token is the smallest unit of
parsed string that will have meaning to the parsing program. The process that scans an input
string and determines what the tokens are is called the tokenizer. In the query,

actor = "Christopher Reeve"

The tokens are

• actor
• =
• "Christopher Reeve"
In your language, you will make actor and the equal sign (=) reserved tokens in the language,
much as the words if and instanceof are reserved tokens with particular meaning within the
Java language. By reserving words and other special tokens, the programmer promises that
the parser will recognize these words literally, assigning a specific meaning to them. So long as
you're reserving words, go ahead and reserve the not equal sign (<>), and, or, the left and right
parentheses. Also reserve title, director, and keyword to represent specific fields for the user to
search.

To define all of these, use the TOKEN directive of JavaCC. Each token definition is enclosed in
angle-brackets, (< and >). The token's name is given on the left hand side of the colon ( : ) and
a regular expression is given on the right hand side. A regular expression is a way of defining a
portion of text to be matched. In its simplest form, regular expressions can match exact sequences
of characters. Use the code below to define six tokens that match exact words and four others that
match symbols. When the parser sees any of the words, it will match them symbolically as AND,
OR, TITLE, ACTOR, DIRECTOR, or KEYWORD. When the symbols are matched, the parser will
return LPAREN, RPAREN, EQUALS, or NOTEQUAL accordingly. Listing 3 shows the JavaCC
reserved tokens definition.

Use JavaCC to build a user friendly boolean query language Page 4 of 21


ibm.com/developerWorks/ developerWorks®

Listing 3. Defining reserved tokens


TOKEN: /*RESERVED TOKENS FOR UQL */
{
<AND: "and">
| <OR: "or">
| <TITLE: "title">
| <ACTOR: "actor">
| <DIRECTOR: "director">
| <KEYWORD: "keyword">
| <LPAREN: "(">
| <RPAREN: ")">
| <EQUALS: "=">
| <NOTEQUAL: "<>">
}

For strings like "Christopher Reeve", you cannot possibly set aside all the names of actors
as reserved words in our language. Instead, you will recognize tokens of type STRING or
QUOTED_STRING using a character pattern defined using regular expressions. Regular
expressions are strings of characters that define a pattern to be matched. Defining the regular
expression that will match all strings or quoted strings is a bit trickier than defining exact word
matches.

You will define a STRING as a series of one or more characters, the valid characters being A
through Z, upper and lower case, and the numerals 0 through 9. For the purposes of simplification,
don't worry about movie stars or movie titles with accented characters or other anomalies. You can
write this pattern as a regular expression in the following way.

<STRING : (["A"-"Z", "a"-"z", "0"-"9"])+ >

The plus sign indicates that the pattern enclosed in the parentheses (any character from A to Z,
a to z, or 0 to 9) should occur one or more times in sequence. In JavaCC, you may also use the
asterisk (*) to indicate zero or more appearances of a pattern or a question mark (?) to indicate 0
or 1 repetition.

QUOTED_STRING is a little trickier. You will define a string to be a QUOTED_STRING if it begins


with a quote, ends with a quote and has any other character in between. The regular expression
for this is "\"" (~["\""])+ "\"" which certainly is an eyeful. To simplify, it helps to understand
that since the quote character itself has meaning to JavaCC, we need to escape the quote for it
to be meaningful to our language instead of JavaCC. To escape the quote, we use a backward
slash to precede it. The tilde character (~) means NOT to the JavaCC tokenizer. (~["\""])+
is shorthand for one or more non-quote characters. Taken altogether, "\"" (~["\""])+ "\""
means ?a quote followed by one or more non-quotes followed by a quote".

You should add the tokenizer rules for STRING and QUOTED_STRING after the rules for the
reserved words. It is important to preserve this ordering because the order that tokenizer rules
appear in the file are the order in which token rules are applied. You want to be very sure that
"title" is taken as the reserved word rather than as a STRING. The complete STRING and
QUOTED_STRING token definition appears in Listing 4.

Use JavaCC to build a user friendly boolean query language Page 5 of 21


developerWorks® ibm.com/developerWorks/

Listing 4. Defining STRING and QUOTED_STRING


TOKEN :
{
<STRING : (["A"-"Z", "0"-"9"])+ >
<QUOTED_STRING: "\"" (~["\""])+ "\"" >
}

Step 4. Define the language in terms of the tokens

Now that you have the tokens defined, it is time to define the parsing rules in terms of the tokens.
The user enters the query in the form of an expression. An expression is defined as a series of
one or more query terms joined by the boolean operators and or or.

To express this, we need to write a parsing rule, also known as a production. Write the production
in Listing 5 into the JavaCC UQLParser.JJ file.

Listing 5. The expression() production


void expression() :
{
}
{ queryTerm()
(
( <AND> | <OR> )
queryTerm() )*

When Javacc is run against a .jj file, productions are translated into methods. The first set of
braces will enclose any declarations that will be needed by the production method. For now, this
is left blank. The second set of braces encloses the production rule written in a way that JavaCC
understands. Note the use of the AND and OR tokens that were defined earlier. Also note that
queryTerm() is written as a method call. queryTerm() in fact, is another production method.

Now, let us define the queryTerm() production. queryTerm() is either a single criterion (such as
title="The Matrix") or an expression surrounded by parentheses. queryTerm() is defined in JavaCC
recursively, via expression(), which allows you to sum up the language succinctly using the code
shown in Listing 6.

Listing 6. The queryTerm() production method in JavaCC (UQLParser.jj)


void queryTerm() :
{
}
{
(<TITLE> | <ACTOR> |
<DIRECTOR> | <KEYWORD>)
( <EQUALS> | <NOTEQUAL>)
( <STRING> | <QUOTED_STRING> )
|
<LPAREN> expression() <RPAREN>
}

That's all the rules we need. The entire language parser has been summed up in two productions.

Use JavaCC to build a user friendly boolean query language Page 6 of 21


ibm.com/developerWorks/ developerWorks®

Taking JavaCC for a test drive


At this point, you should have a valid JavaCC file. You can compile and "run" this program to see if
your parser operates correctly before moving on to step 5.

The ZIP file provided with this paper should include the sample stage 1 JavaCC file, UQLParser.jj.
Unzip the entire ZIP file into an empty directory. To compile stage1/UQLParser.jj, first you'll need to
download JavaCC and install according to the directions at the JavaCC Web page. For simplicity's
sake, be sure to put the executable path for Javacc.bat into your PATH environment variable.
Compilation is easy; cd into the directory where you've unloaded UQLParser.jj and enter the
following.

javacc ?debug_parser ?output_directory=.\com\demo\stage1 UQLParser.jj

You may also use the enclosed Ant file, build.xml if you prefer. You will have to make some
adjustments to the top of the properties file to point to your JavaCC installation. JavaCC should
produce the messages shown in Listing 7 the first time you run it.

Listing 7. Output from compilation of UQLParser.jj


Java Compiler Compiler Version 3.2 (Parser Generator)
(type "javacc" with no arguments for help)
Reading from file UQLParser.jj . . .
File "TokenMgrError.java" does not exist. Will create one.
File "ParseException.java" does not exist. Will create one.
File "Token.java" does not exist. Will create one.
File "SimpleCharStream.java" does not exist. Will create one.
Parser generated successfully.

In addition to the four files mentioned, JavaCC will also produce UQLParser.java,
UQLParserConstants.java, and UQLParserTokenManager.java. All of these are written to the
com\demo\stage1 directory. From here, you should be able to compile these files without any
additions to the default runtime classpath. The Ant file default target will perform the Java compile
automatically if the JavaCC step runs successfully. If not, you can compile your files from the top
level directory (JavaCCPaper/stage1) with

javac ?d bin src\com\demo\stage1\*.java

Once you have Java class files in place, you may test your new parser by entering the following
sample user query into the "main" java method you have defined. If you are using the same code,
begin from the JavaCCPaper/stage1 directory and enter the following on the command line.

java ?cp bin com.demo.stage1.UQLParser "actor = \"Tom Cruise\""

The ?debug_parser option that we used during the JavaCC step ensures that the following useful
trace messages are output, indicating how the user query is parsed. The output should appear as
in Listing 8.

Use JavaCC to build a user friendly boolean query language Page 7 of 21


developerWorks® ibm.com/developerWorks/

Listing 8. Output from UQLParser, query actor="Tom Cruise"


Call: parse
Call: expression
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Tom Cruise"">
Return: queryTerm
Return: expression
Consumed token: <<EOF>>
Return: parse

To test the recursive path for parenthesized expressions, try the following test.

java ?cp bin com.demo.stage1.UQLParser "(actor=\"Tom Cruise\" or actor=\"Kelly McGillis


\") and keyword=drama"

This should produce the output in listing 9.

Listing 9. Output from UQL1Parser, query (actor="Tom Cruise" or actor="Kelly


McGillis") and keyword=drama
Call: parse
Call: expression
Call: queryTerm
Consumed token: <"(">
Call: expression
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Tom Cruise"">
Return: queryTerm
Consumed token: <"OR">
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Kelly McGillis"">
Return: queryTerm
Return: expression
Consumed token: <")">
Return: queryTerm
Consumed token: <"AND">
Call: queryTerm
Consumed token: <"keyword">
Consumed token: <"=">
Consumed token: <<STRING>: "drama">
Return: queryTerm
Return: expression
Consumed token: <<EOF>>
Return: parse

This output is useful because it illustrates the recursion through queryTerm and expression. The
first instance of queryTerm is actually an expression made up of two queryTerms. Figure 1 shows a
pictorial view of this parse path.

Use JavaCC to build a user friendly boolean query language Page 8 of 21


ibm.com/developerWorks/ developerWorks®

Figure 1. Graphical representation of parse of user query

If you're curious as to what Java code was produced by JavaCC, by all means, have a look (but
don't try to change any of it!). Here is what you'll find.

UQLParser.java ? In this file, you will find the code that you placed between PARSER_BEGIN and
PARSER_END in your UQLParser.jj file. You will also discover that your JJ production methods
have been mutated into Java methods.

For example, expression() rule has been expanded into the code found in Listing 10.

Listing 10. UQLParser.java


static final public void expression() throws ParseException {
trace_call("expression");
try {
queryTerm();
label_1:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case AND:
case OR:
;
break;
default:
jj_la1[0] = jj_gen;
break label_1;
}
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case AND:
jj_consume_token(AND);
break;
case OR:
jj_consume_token(OR);

Use JavaCC to build a user friendly boolean query language Page 9 of 21


developerWorks® ibm.com/developerWorks/

break;
default:
jj_la1[1] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
queryTerm();
}
} finally {
trace_return("expression");
}
}

It bears some resemblance to what you originally wrote in that queryTerm(), AND, and OR make
an appearance but the rest is parsing detail that JavaCC has added around it.

UQLParserConstants.java ? This file is easy to figure out. All of the tokens that you have defined
are here. JavaCC has merely listed them in an array and provided integer constants to reference
into that array. Listing 11 shows the contents of UQLParserConstants.java

Listing 11. UQLParserConstants.java


/* Generated By:JavaCC: Do not edit this line.
UQLParserConstants.java */

package com.demo.stage1;

public interface UQLParserConstants {

int EOF = 0;
int AND = 5;
int OR = 6;
int TITLE = 7;
int ACTOR = 8;
int DIRECTOR = 9;
int KEYWORD = 10;
int LPAREN = 11;
int RPAREN = 12;
int EQUALS = 13;
int NOTEQUAL = 14;
int STRING = 15;
int QUOTED_STRING = 16;

int DEFAULT = 0;

String[] tokenImage = {
"<EOF>",
"\" \"",
"\"\\t\"",
"\"\\r\"",
"\"\\n\"",
"\"and\"",
"\"or\"",
"\"title\"",
"\"actor\"",
"\"director\"",
"\"keyword\"",
"\"(\"",
"\")\"",
"\"=\"",
"\"<>\"",
"<STRING>",
"<QUOTED_STRING>",

Use JavaCC to build a user friendly boolean query language Page 10 of 21


ibm.com/developerWorks/ developerWorks®

};

UQLParserTokenManager.java ? This is one scary file. JavaCC uses this class as the tokenizer.
This is the piece of code that determines what the tokens are. The primary routine of interest here
is GetNextToken. This is the routine used by the parser production methods to determine which
path to take.

SimpleCharStream.java ? This file is used by UQLParserTokenManager to represent the ASCII


stream of characters to be parsed.

Token.java ? The Token class is provided to represent the tokens themselves. The next part of
this paper will demonstrate the usefulness of the Token class.

TokenMgrError.java and ParseException ? These classes represent exception conditions in the


tokenizer and parser respectively.

Stage 2 - Adding behavior to JavaCC code


Note: For this part of the tutorial, refer to the stage2 subdirectory of the code. The JJ file that is
presented from here on is JavaCCPaper/stage2/UQLParser.jj. In order to run your sample SQL
queries, you should also create the MOVIEDB database using the enclosed moviedb.sql file.
Execute the DDL using db2 -tf moviedb.sql.

Now that we've parsed, we need to take action on the individual expressions. The goal of this
stage is to produce a runnable DB2 SQL query that will return what the user expects.

The process should start with a boilerplate SELECT containing a blank spot where the parser will
fill in the rest. The SELECT template appears in Listing 12. The query produced by the parser may
not be as optimal as a human DBA might write, but it will return the correct results expected by the
end user.

Listing 12. The SELECT statement


SELECT TITLE, DIRECTOR
FROM MOVIE
WHERE MOVIE_ID IN
(
? parser will fill in here?.
);

What the parser fills in depends on the path it takes through the tokenizer. For example, if the user
enters the query from above:

(actor="Tom Cruise" or actor="Kelly McGillis") and keyword=drama"

then the parser should emit text into the missing portion of the SQL query according to Figure
2. It will echo the parentheses, enter subqueries for the terminal queryTerms and substitute
INTERSECT for AND and UNION for OR.

Use JavaCC to build a user friendly boolean query language Page 11 of 21


developerWorks® ibm.com/developerWorks/

Figure 2. Parser output to SQL query

This would ensure that the SQL query emitted for

(actor = "Tom Cruise" or actor = "Kelly McGillis") and keyword=drama

would appear as in Listing 13.

Listing 13. The complete SELECT statement


SELECT TITLE, DIRECTOR
FROM MOVIE
WHERE MOVIE_ID IN
(
(
SELECT MOVIE_ID
FROM ACTOR
WHERE NAME=?Tom Cruise?
UNION
SELECT
MOVIE_ID
FROM ACTOR
WHERE NAME=?Kelly McGillis?
)
INTERSECT
SELECT MOVIE_ID
FROM KEYWORD
WHERE KEYWORD=?drama?
);

As mentioned earlier, there are probably shorter, more optimal ways to write this particular
query but this SQL will produce correct results. Generally, the DB2 optimizer can smooth out
performance deficiencies.

Use JavaCC to build a user friendly boolean query language Page 12 of 21


ibm.com/developerWorks/ developerWorks®

So, what needs to be added to the JavaCC source to produce this? You must add actions
and other supporting code to the defined grammar. An action is Java code that is executed in
response to a particular production. Before adding actions, first add a method that will return
the completed SQL to the caller. To do this, add a method called getSQL() to the top part of the
JavaCC file. You should also add the private StringBuffer sqlSB to the parser's internal members.
This variable will represent the current SQL string at any stage of the parsing. Listing 14 shows the
PARSER_BEGIN/PARSER_END section of UQLParser.jj. Finally, add some code in your main()
test method to print out and execute the generated SQL query.

Listing 14. PARSER_BEGIN/PARSER_END section


PARSER_BEGIN(UQLParser)

package com.demo.stage2;

import java.sql.DriverManager;
import java.sql.Connection;
import java.sql.Statement;
import java.sql.ResultSet;

import java.sql.Statement;
import java.lang.StringBuffer;
import java.io.StringReader;
import java.io.Reader;

public class UQLParser {

private static StringBuffer sqlSB;


// internal SQL representation.

public UQLParser(String s)
{
this((Reader)(new StringReader(s)));
sqlSB = new StringBuffer();
}

public String getSQL()


{
return sqlSB.toString();
}

public static void main(String args[])


{
try
{
String query = args[0];
UQLParser parser =
new UQLParser(query);
parser.parse();
System.out.println("\nSQL Query: " +
parser.getSQL());

// Note: This code assumes a


// default connection
// (current userid and password).
System.out.println("\nResults of Query");

Class.forName(
"COM.ibm.db2.jdbc.app.DB2Driver"
).newInstance();
Connection con =
DriverManager.getConnection(
"jdbc:db2:moviedb");

Use JavaCC to build a user friendly boolean query language Page 13 of 21


developerWorks® ibm.com/developerWorks/

Statement stmt =
con.createStatement();
ResultSet rs =
stmt.executeQuery(parser.getSQL());
while(rs.next())
{
System.out.println("Movie Title = " +
rs.getString("title") +
" Director = " +
rs.getString("director"));
}
rs.close();
stmt.close();
con.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
PARSER_END(UQLParser)

Now fill in the actions taken by the parser. We'll start with the easy one first. When an expression
is being parsed, the parser should emit the word "INTERSECT" whenever it parses "AND" and
"UNION" whenever it parses "OR". To do that, insert self-contained blocks of Java code after the
<AND> and <OR> tokens in the expression production. The code should append INTERSECT or
UNION to the sqlSB StringBuffer. This is illustrated in Listing 15.

Listing 15. Actions taken for expressions


void expression() :
{
}
{ queryTerm()
(
( <AND>
{ sqlSB.append("\nINTERSECT\n"); }
| <OR>
{ sqlSB.append("\nUNION\n"); }
)
queryTerm() )*
}

Several actions need to be taken within the queryTerm()production. These tasks are as follows:

1. Map the search names to their respective DB2 tables and columns
2. Save the comparator token.
3. Put the comparands into a form that DB2 can understand, for example remove the double
quotes from QUOTED_STRING tokens
4. Output the proper subselect into sqlSB
5. For the recursive expression case, emit the parentheses as is.

For all of these tasks, you will need some local variables. These are defined between the first pair
of braces in the production, as shown in Listing 16.

Use JavaCC to build a user friendly boolean query language Page 14 of 21


ibm.com/developerWorks/ developerWorks®

Listing 16. Local variables for queryTerm()


void queryTerm() :
{
Token tSearchName, tComparator, tComparand;
String sComparand, table, columnName;
}

The first task can be accomplished with the code in Listing 17. Set the proper DB2 table and
column associated with the token encountered.

Listing 17. Mapping search names to DB2


(
<TITLE> {table = "movie";
columnName = "title"; } |
<DIRECTOR> {table = "movie";
columnName = "director"; } |
<KEYWORD> {table = "keyword";
columnName = "keyword"; } |
<ACTOR> {table = "actor";
columnName = "name"; }
)

The second task can be accomplished with the code in Listing 18. Save the token so that it can be
used in the SQL buffer.

Listing 18. Saving the comparator


( tComparator=<EQUALS> |
tComparator=<NOTEQUAL> )

The third task can be accomplished with the code in Listing 19. Set the comparand value
accordingly, stripping the double quotes from the QUOTED_STRING token if necessary.

Listing 19. Preparing the comparands


tComparand=<STRING> {
sComparand = tComparand.image; }
|
tComparand=<QUOTED_STRING>
{ // need to get rid of quotes.
sComparand =
tComparand.image.substring(1,
tComparand.image.length() - 1);
}

The fourth task can be accomplished with the code in Listing 20. The complete query term is
appended to the sql buffer.

Listing 20. Writing the SQL expressions


{
sqlSB.append("SELECT MOVIE_ID FROM ").append(table);
sqlSB.append("\nWHERE ").append(columnName);
sqlSB.append(" ").append(tComparator.image);
sqlSB.append(" '").append(sComparand).append("'");
}

Use JavaCC to build a user friendly boolean query language Page 15 of 21


developerWorks® ibm.com/developerWorks/

Finally, for the case of recursive expressions, the parser should simply echo parentheses when it
sees them within the expression recursion, as illustrated in Listing 21.

Listing 21. Echoing parentheses


<LPAREN>
{ sqlSB.append("("); }
expression()
<RPAREN>
{ sqlSB.append(")"); }

Listing 22 shows the completed queryTerm() production.

Listing 22. The complete queryTerms() production


/**
* Query terms may consist of a parenthetically
* separated expression or may be a query criteria
* of the form queryName = something or
* queryName <> something.
*
*/
void queryTerm() :
{
Token tSearchName, tComparator, tComparand;
String sComparand, table, columnName;
}
{
(
<TITLE> {table = "movie";
columnName = "title"; } |
<DIRECTOR> {table = "movie";
columnName = "director"; } |
<KEYWORD> {table = "keyword";
columnName = "keyword"; } |
<ACTOR> {table = "actor";
columnName = "name"; }
)

( tComparator=<EQUALS> |
tComparator=<NOTEQUAL> )

(
tComparand=<STRING>
{ sComparand = tComparand.image; } |
tComparand=<QUOTED_STRING>
{ // need to get rid of quotes.
sComparand = tComparand.image.substring(1,
tComparand.image.length() - 1);
}
)

{
sqlSB.append("SELECT MOVIE_ID FROM ").append(table);
sqlSB.append("\nWHERE ").append(columnName);
sqlSB.append(" ").append(tComparator.image);
sqlSB.append(" '").append(sComparand).append("'");
}
|
<LPAREN>
{ sqlSB.append("("); }
expression()
<RPAREN>
{ sqlSB.append(")"); }
}

Use JavaCC to build a user friendly boolean query language Page 16 of 21


ibm.com/developerWorks/ developerWorks®

Compile and run UQLParser.jj as before. Visit UQLParser.java and note how your production rules
have been neatly inserted into the generated code. An example of the expanded expression()
method is shown in Listing 23. Note the code after the jj_consume_token calls.

Listing 23. expression() method in UQLParser.java


/**
* An expression is defined to be a queryTerm followed by zero
* or more query terms joined by either an AND or an OR. If two
* query terms are joined with * AND then both conditions must
* be met. If two query terms are joined with an OR, then
* one of the two conditions must be met.
*/
static final public void expression() throws ParseException {
trace_call("expression");
try {
queryTerm();
label_1:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case AND:
case OR:
;
break;
default:
jj_la1[0] = jj_gen;
break label_1;
}
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case AND:
jj_consume_token(AND);
sqlSB.append("\nINTERSECT\n");
break;
case OR:
jj_consume_token(OR);
sqlSB.append("\nUNION\n");
break;
default:
jj_la1[1] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
queryTerm();
}
} finally {
trace_return("expression");
}
}

Run this code as before. You will have to include db2java.zip in the CLASSPATH. This time, when
you run

java ?cp bin;c:/sqllib/db2java.zip com.demo.stage2.UQLParser "(actor=\"Tom Cruise\" or


actor=\"Kelly McGillis\") and keyword=drama"

it should produce the output in listing 24.

Listing 24. Output from UQL2Parser, query (actor="Tom Cruise" or


actor="Kelly McGillis") and keyword=drama
Call: parse

Use JavaCC to build a user friendly boolean query language Page 17 of 21


developerWorks® ibm.com/developerWorks/

Call: expression
Call: queryTerm
Consumed token: <"(">
Call: expression
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Tom Cruise"">
Return: queryTerm
Consumed token: <"or">
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Kelly McGillis"">
Return: queryTerm
Return: expression
Consumed token: <")">
Return: queryTerm
Consumed token: <"and">
Call: queryTerm
Consumed token: <"keyword">
Consumed token: <"=">
Consumed token: <<STRING>: "drama">
Return: queryTerm
Return: expression
Consumed token: <<EOF>>
Return: parse

SQL Query: SELECT TITLE,DIRECTOR


FROM MOVIE
WHERE MOVIE_ID IN (
(SELECT MOVIE_ID FROM actor
WHERE name = 'Tom Cruise'
UNION
SELECT MOVIE_ID FROM actor
WHERE name = 'Kelly McGillis')
INTERSECT
SELECT MOVIE_ID FROM keyword
WHERE keyword = 'drama')

Results of Query
Movie Title = Top Gun Director = Tony Scott
Movie Title = Witness Director = Peter Weir

Try a few more queries to get used to your parser. Try a query using the NOTEQUAL token, as in
actor<>"Harrison Ford". Try some illegal queries like "title=" to see what happens. With very few
lines of JavaCC code, you have produced a very effective end-user query language.

Final Points to Consider


JavaCC, in addition to providing the parser builder, also provides the JJDOC tool for documenting
your grammar in Backus-Naur Form. JJDOC can make it easy for you to provide your end users
with a description of the language they are using. The ant file provided in the accompanying code
has a "bnfdoc" target as an example.

JavaCC also provide a tool called JJTree. This tool provides tree and node classes which make
it easy for you to partition your code into separate parsing and action classes. Moving forward
with this example, you might consider writing a modest optimizer for your query to eliminate
unnecessary INTERSECTs and UNIONS. You could accomplish this by visiting the nodes of

Use JavaCC to build a user friendly boolean query language Page 18 of 21


ibm.com/developerWorks/ developerWorks®

the parse tree and consolidating similar adjacent nodes (actor="Tom Cruise" and actor="Kelly
McGillis", for example).

JavaCC has a rich library of grammars. Before writing a parser yourself, be sure to look in
JavaCC's examples directory for a possible ready-built solution.

Be sure to read the Frequently Asked Questions on the JavaCC Web page and visit the javacc
newsgroup at comp.compilers.tools.javacc to get a better understanding of all the capabilities and
features of JavaCC.

Conclusion
JavaCC is a robust tool that can be used to define grammars and easily incorporate the parsing
and exception handling for that grammar into your Java business application. With this article, we
have demonstrated that JavaCC can be used to provide a powerful, yet simple query language for
end users of a database system.

Use JavaCC to build a user friendly boolean query language Page 19 of 21


developerWorks® ibm.com/developerWorks/

Downloadable resources
Description Name Size
JavaCCPaper.zip ( HTTP | FTP ) 8.47KB

Use JavaCC to build a user friendly boolean query language Page 20 of 21


ibm.com/developerWorks/ developerWorks®

Related topics
• YACC: Yet Another Compiler Compiler The very first widely available "compiler compiler" is
described in a paper by Johnson, Stephen C, AT&T Bell Laboratories, Murray Hill, New Jersey
07974
• Build your own Languages with JavaCC is a very good JavaWorld article written by Ensileng,
Oliver.

© Copyright IBM Corporation 2004


(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Use JavaCC to build a user friendly boolean query language Page 21 of 21

You might also like