You are on page 1of 22

JFlex

The Fast Lexical Analyser Generator


Introduction
• JFlex is a lexical analyser generator for Java written in Java. It is also a
rewrite of the tool Jlex.
• A lexical analyser generator takes as input a specification with a set of
regular expressions and corresponding actions.
• It generates a program (a lexer) that reads input, matches the input
against the regular expressions.
• Lexers usually are the first front-end step in compilers, matching:
• keywords, comments, operators, etc,
• and generating an input token stream for parsers.
Installing JFlex
• Unzip the file you downloaded
into the directory you want
JFlex in.
• Edit the file bin\jflex.bat
• JAVA_HOME contains the
directory where your Java JDK is
installed.
• JFLEX_HOME the directory that
contains Jflex
• Include the bin\ directory of
JFlex in your path.
Lexical Specifications
• a lexical specification file for JFlex consists of three parts divided by a
single line starting with %%:

this is the place to put package declarations and import


statements.
User Code
• The first part contains user code that is copied verbatim to the
beginning of the generated source file before the scanner class
declaration.
• this is the place to put package declarations and import statements.
Options and declarations
• The second part of the lexical specification contains options and
directives to customise the generated lexer, declarations of lexical
states and macro definitions.
Some JFlex Directives
%class classname
Tells JFlex to give the generated class the name classname and to write the generated code to a file classname.java.
%public
Makes the generated class public (the class is only accessible in its own package by default).
%standalone
Creates a main function in the generated class that expects the name of an input file on
the command line and then runs the scanner on this input file.
%char
Turns character counting on. The long member variable yychar contains the number of characters (starting with 0) from the beginning
of input to the beginning of the current token.
%line
Turns line counting on. The int member variable yyline contains the number of lines (starting with 0) from the beginning of input to
the beginning of the current token.
%column
Turns column counting on. The int member variable yycolumn contains the number of characters (starting with 0) from the beginning
of the current line to the beginning of the current token.
%unicode
cause the generated scanner to use the full Unicode input character set,
Some built-in methods and fields
int yychar
contains the current character count in the input (starting with 0, only active with the charCounting directive)
int yyline
contains the current line of input (starting with 0, only active with the lineCounting directive)
int yycolumn
contains the current column of the current line (starting with 0, only active with the columnCounting directive)
String yytext()
returns the matched input text region
int yylength()
returns the length of the matched input text region as number of Java chars (as opposed to Unicode code points).
int yystate()
returns the current lexical state of the scanner.
void yybegin(int lexicalState)
enters the lexical state lexicalState
/* This is my first JFlex Specification file for lexical analysis
Example: Sheep of the sheep language */

Language // This is the user code - I will leave it empty

%%
baaa ba baaa /* Options and Declaration part */
hello %class SheepLanguage
this is the sheep language %char
ba baa baaa aabaabaaaabababaaaabbb %line
%column
%standalone
Match found baaa Length: 4 at location 0
Match found ba Length: 2 at location 5 //Define regular expressions
Match found baaa Length: 4 at location 8 word = ba+
Match found ba Length: 2 at location 50
Match found baa Length: 3 at location 53 %%
Match found baaa Length: 4 at location 57 /* this is the third part (Lexical rules) */
Match found baa Length: 3 at location 64
Match found baaaa Length: 5 at location 67 {word} {System.out.println("Match found " + yytext() + " Length: "
Match found ba Length: 2 at location 72 + yylength() + " at location " + yychar);}
Match found ba Length: 2 at location 74 \n {}
Match found baaaa Length: 5 at location 76 . {}
Using Command line
Step 1 JFlex_examples>jflex MyFirstJfelxSpecsFile.flex
Reading "MyFirstJfelxSpecsFile.flex"
>jflex SpecsFileName.flex Constructing NFA : 14 states in NFA

This will generate the java Converting NFA to DFA :

class (classname.java) .....


7 states before minimization, 5 states in minimized DFA
Old file "SheepLanguage.java" saved as "SheepLanguage.java~"
Writing code to "SheepLanguage.java"
Step 2: Compile Java class
Jflex_examples>javac SheepLanguage.java
>javac classname.java JFlex_examples>java SheepLanguage input.txt
Match found baaa Length: 4 at location 0 at line: 0 at column: 0
Match found ba Length: 2 at location 5 at line: 0 at column: 5
Step 3: run using input file Match found baaa Length: 4 at location 8 at line: 0 at column: 8

>java classname inputfile Match found ba Length: 2 at location 50 at line: 3 at column: 0


Match found baa Length: 3 at location 53 at line: 3 at column: 3
>java classname inputfile > output.txt JFlex_examples>java SheepLanguage input.txt >results.txt
TINY Programming Language
{Factorial Program in TINY} • Simple toy language
read x;
if x > 0 then • Pascal-like syntax
fact := 1; • If-then-end,
repeat • if-then-else-end,
fact := fact * x; • repeat-until,
x := x – 1;
• assignment,
until x = 0;
write fact; • read,
end • write.
Language Features
• Semicolons as separators but not terminators.
• Integer variables only; no declarations.
• Arithmetic expressions, variables, constants, +, -, *, /, (,)
• Relational (Boolean) expressions: arithmetic expressions;
• <, =
• read, write perform simple input/output.
• Comments enclosed in {}
TINY’s tokens
• Reserved words: if, then, else, repeat, until, read,
write
• Special symbols: +, -, *, /, =, <, (, ), :=, ;
• Number: one or more digit
• Identifier: one or more letter
• Comment: any sequence of symbols (other than {})
and enclosed in {….}
TINY Programming Language
{Factorial Program in TINY} >jflex Tinyscanner1.flex
read x; >javac Tinyscanner
if x > 0 then >java Tinyscanner sample.txt
fact := 1; READ
repeat ID(x)
SEMI
fact := fact * x; IF
ID(X)
x := x – 1; GT
until x = 0; NUM(0)
THEN
write fact; ID(fact)
ASSIGN
end NUM(1)
SEMI
REs
• digit = [0-9]
• number = {digit}+
• letter = [a-zA-Z]
• identifier = {letter}+
• newline = \n
• Whitespace = [\t ]+
Rules for reserved words
"if" {System.out.println("IF");}
"then" {System.out.println("THEN");}
"else" {System.out.println("ELSE");}
"repeat" {System.out.println("REPEAT");}
"until" {System.out.println("UNTIL");}
"read" {System.out.println("READ");}
"write" {System.out.println("WRITE");}
":=" {System.out.println("ASSIGN");}
"=" {System.out.println("EQ");}
"<" {System.out.println("LT");}
">" {System.out.println("GT");}
Rules for numbers and identifiers

• {number} {System.out.println("NUM(%d)\n",
Integer.parseInt(yytext()));}
• {identifier} {System.out.println("ID(%s)\n",
yytext());}
Whitespaces, comments, newlines, and
others
{whitespace} {/* skip white spaces */}
\{[^}]*\} {/* skip comments */}
{newline} {/* skip new lines */}
. {System.out.println("UNKNOWN SYMBOL(%s)\n", yytext();}

• Simply skip whitespaces, comments, and newlines.


• The last rule matches any symbol that is not matched by the other
rules, e.g, symbols like #, @, %
More Sophisticated Version – TinyScanner2
%% • Facilitates integration with other
%class TinyScanner compile elements
%function nextToken • (Most) actions return.
%type TinyToken • Jflex creates a “read the next
…. token” method within generated
%% code.
… • Named nextToken(default yylex)
returns a TinyToken object(null at
“if” {return new the end of file)
TinyToken(TinyToken.TokenKind.RW_IF);} • %function and %type specifies
these names
Class TinyToken
public class TinyToken{ • Represents token
public TinyToken(TokenKind k){kind = k}; data (e.g. kind etc.)
/*
OTHER METHODS
• TokenKind encodes
*/
Token Specification.
public enum TokenKind{ • Value: numerical
RW_IF, RW_THEN, RW_END, RW_REPEAT, RW_UNTIL, value for
RW_READ, RW_WRITE, SYM_ASSIGN, SYM_EQ, SYM_LT, NUMBERs.
SYM_PLUS, SYM_MINUS, SYM_TIMES, SYM_OVER,
SYM_LPAREN, SYM_RPAREN, SEMI, NUMBER, ID, ILLEGAL} • Spelling e.g. ID
private TokenKind kind;
private int value;
private string Spelling;
}
Using TinyScanner2
TinyToken current;
TinyScanner scanner = null;
scanner = new TinyScanner( FileReader(“Tiny_program.txt”);
current = scanner.nextToken();
While (current != null){
System.out.println(“Token [%s]\n % current.toString());
current.nextToken();
}

You might also like