Professional Documents
Culture Documents
Lex
The main job of a lexical analyzer (scanner) is to break up an input stream into
more usable elements (tokens)
For example :a = b + c * d;
ID ASSIGN ID PLUS ID MULT ID SEMI
Lex is a tool for writing lexical analyzers.
Lex Source Program
An input file, which we call lex.l is written in the Lex language and describes the lexical
analyzer to be generated.
The Lex compiler transforms lex.l to a C program, in a file that is always named lex.yy.c.
The latter file is compiled by the C compiler into a file called a.out, as always.
The C-compiler output is a working lexical analyzer that can take a stream of input
characters and produce a stream of tokens.
It is a C function that returns an integer, which is a code for one of the possible token
names.
The attribute value, whether it be another numeric code, a pointer to the symbol table, or
nothing, is placed in a global variable yylval, which is shared between the lexical
analyzer and parser, thereby making it simple to return both the name and an attribute
valueof a token.
Structure of Lex Programs
“\[]^-?.*+|()$/{}%<>
If they are to be used as text characters, an escape should be used \$ = “$”
\\ = “\”
Every character but blank, tab (\t), newline (\n) and the list above is always a text
character
Designing patterns
[ ] Brackets are used to denote a character class, which matches any single
character within the brackets. If the first character is a ‘^’, this negates the
brackets causing them to match any character except those listed. The ‘-’ can be
used in a set of brackets to denote a range.
“ ” Match everything within the quotes literally - don’t use any special meanings
for characters.
( ) Group everything in the parentheses as a single unit for the rest of the
expression.
Character Classes []
(x) x
x/y x but only if followed by y
{xx} The translation of xx from the definitions
section
x$ x at the end of a line
^x x at the beginning of a line
[ \t] matches either a space or tab character
Level of precedence –
Kleene closure (*), ?, +
concatenation
alternation (|)
All operators are left associative.
Ex: a*b|cd* = ((a*)b)|(c(d*))
Pattern Matching Primitives
Lex Predefined Variables
We have alluded to the two rules that Lex uses to decide on the proper lexeme to
select, when several prefixes of the input match one or more patterns:
1. Always prefer a longer prefix to a shorter prefix.
2. If the longest possible prefix matches two or more patterns, prefer the pattern listed first
in the Lex program.
To run Lex on a source file, type for eg test1.l
flex test1.l
It produces a file named lex.yy.c which is a C program for the lexical analyzer.
To compile lex.yy.c, type gcc lex.yy.c -o test1.exe
To run the lexical analyzer program, type test1.exe
Yacc - Yet Another Compiler Compiler
What is YACC ?