You are on page 1of 27

Lab 1

Lexical analyzer
How to implement lexical analyzer by using lex tools

• Lex :is a tool used to generate a lexical analyzer/scanner. 

• Systematically translate regular definitions into c source code for efficient scanning

• LEX translates a set of regular expression specifications (given as input in input_file.l)


into a C implementation of a corresponding finite state machine (lex.yy.c). This C

program, when compiled, yields an executable lexical analyzer .


Generating a Lexical Analyzer using Lex
• Function of LEX lex kind of file
 source code is in lex language with filename .l extension. It is given to Lex compiler which is the lex
tool and produces lex.yy.c( c program as output).
 C compiler runs this c code
• i.e lex.yy.c program and produces an output a.out(lexical analyzer)
 a.out works as the scanner/LA which transforms an input stream into a sequence/stream of tokens
• lex translated c code
• c source code
Structure of Lex program
 Program structure C declarations in %{ %}
...declaration ...
Pattern1 { action1 }
%% Pattern2 { action2 }
...translation rule ...
%%
...auxiliary functions...

4
Program structure
• In the input file, there are 3 sections;
1.Declaration section:
 Useful for declaration of c variables ,constants and header files
%{ Example %{
variables ,constant int a,b;
float c=2.0;
%} %}

 Useful for defining regular expression not enclosed by this two symbol % { %}
 Example: digit [0-9]
letter [a-Za-Z]
2. The rules section

 Translation rules are useful in order to specify the patterns


 Each rule specified in the form of pattern followed by action
%%
<pattern> { <action to take when matched> }
<pattern> { <action to take when matched> }

%%
 Patterns are specified by regular expressions.
 Actions are c language statement
• For example:
• %%
• [A-Za-z] + { printf(“this is a word”); }
• %%
3.auxiliary functions
 All the functions which are needed is defined in this section
Lex variables
yyin - of the type FILE*. This points to the current file
being scanned by the lexer.
yyout - Of the type FILE*. This points to the location
where the output of the lexer will be written.
• By default, both yyin and yyout point to standard input
and output.
yytext – variable, a pointer to the matched strings. yytext
is of type char* and it contains the lexeme currently
found.
yyleng - Gives the length of the matched pattern.
yylineno - Provides current line number information.

8
Lex functions
 yylex() - The function that starts the analysis. It is automatically generated by Lex.

 Reads the input stream and generate tokens according to the regular
expression/pattern written in rules section
 yywrap() - This function is called when end of file (or input) is encountered. If this

function returns 1, the parsing stops.


• i.e called by lex tool when input is exhausted return 1 if input is finished else 0
 yytext – is of type char* and it contains the lexeme currently found.

• Specify the name of the token

9
Examples
%{
#include <stdio.h>
%}
%%
if|else|while|do|switch|case { printf("%s is keyword \n", yytext); }
[a-zA-Z][a-zA-Z|0-9]* { printf("%s is identifier \n", yytext); }
[0-9]+(\.[0-9]+)?(E[+\-]?[0-9]+)? { printf("%s is number \n", yytext); }
"<"|"<="|"="|"<>"|">"|">=" { printf("%s is relational operator \n", yytext); }
"!"|"@"|"*"|"&"|"^"|"%"|"$"|"#" {printf("%s Special Character\n",yytext);}
%%
int yywrap()
{
return 1;
}
main()
{
printf("Enter a string of data\n");
yylex();
}
Examples
%{
#include <stdio.h>
%}
digit [0-9]
letter [A-Za-z]
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
%%
if|else|while|do|switch|case { printf("%s is keyword \n", yytext); }
{id} { printf("%s is identifier \n", yytext); }
{number} { printf("%s is number \n", yytext); }
"<"|"<="|"="|"<>"|">"|">=" { printf("%s is relational operator \n", yytext); }
"!"|"@"|"*"|"&"|"^"|"%"|"$"|"#" {printf("%s Special Character\n",yytext);}
%%
int yywrap()
{
return 1;
}

main()
{
printf("Enter a string of data\n");
yylex();
}
How to run lex
• In cmd write
• lex name.l
• cc lex.yy.c
• a.exe
Lexical analyzer
• flex is a tool for generating scanners. A scanner is a program
which recognizes lexical patterns in text.

• The flex program reads the given input files, or its standard
input if no file names are given, for a description of a scanner
to generate.

• The description is in the form of pairs of regular expressions


and C code, called rules.
Lab 2
Syntax analyzer
Yacc…
• It stands for yet another compiler compiler.
• It is tool which generates the parse.
 There are four steps involved in creating a compiler in Yacc:
1. Generate a parser from Yacc by running Yacc over the grammar file.
2. Specify the grammar:
– Write the grammar in a .y file (also specify the actions here that are to be taken
in C).
– Write a lexical analyzer to process input and pass tokens to the parser. This can
be done using Lex.
– Write a function that starts parsing by calling yyparse().
– Write error handling routines (like yyerror()).
3. Compile code produced by Yacc as well as any other relevant source
files.
4. Link the object files to appropriate libraries for the executable parser.
Lex Yacc interaction
yacc y.tab.c
Yacc
specification y.tab.h
Yacc.y compiler

Lex Lex
lex.l lex.yy.c
and token definitions compiler
y.tab.h

lex.yy.c C a.out
y.tab.c compiler

input a.out output


stream stream

17
Lex Yacc interaction…
 If lex is to return tokens that yacc will process, they
have to agree on what tokens there are. This is
done as follows:
 The yacc file will have token definitions
%token INTEGER
in the definitions section.
 When the yacc file is translated with yacc -d, a header file
y.tab.h is created that has definitions like
#define INTEGER 258
 This file can then be included in both the lex and yacc
program.
 The lex file can then call return INTEGER, and the yacc
program can match on this token.

18
To design program in yacc language
yyval-value associated with the tokens are returned by lex in the
variable yyval.
yyval=atoi(yytext):converts input to numeric and stores in a variable
yytext: it is a pointer to the input character stream/matched input string
yywrap(): called by lex/yacc when input is exhausted/finished(return 1
when input finished)
yyparse():it is responsible for parsing to occur. It reads tokens and
executes the actions. if it gives 0 means string accepted.
Structure of program
 As with Lex, a Yacc program is also divided into three

sections separated by double percent signs.


 A yacc specification consists of three parts:

yacc declarations, and C declarations within %{ %}

%%

translation rules

%%

user-defined auxiliary procedures


yacc declaration
First part of a yacc specification includes;
•C declaration enclosed in %{ %}
•Yacc definition
•%token
Named tokens must be declared first in the declaration part using
%token TokenName
translation rules

The translation rules are productions with actions
production1 {semantic action1}
production2 {semantic action2}

productionn {semantic actionn}
Writing a Grammar/productuon in Yacc

 A grammar –a set of productions the LHS of a


production is followed by a colon ,and RHS.
 Multiple RHS may follow separated by a “|”.
 Action associated with a rule are entered in
braces.
 Productions in Yacc are of the form:
Nonterminal : tokens/nonterminals { action }
| tokens/nonterminals { action }

;
 Tokens that are single characters can be used
directly within productions, e.g. ‘+’
22
yacc production example
• headbody1|body2
• head: body1 {semantic action}
• |body2 {sematic action}
• Use 2 symbol
• $$:represents LHS nonterminal attribute value
• $i: represent ith symbol of the body
Example EE + T
$$ $1 $2 $3
E : E '+' T {$$=$1 + $3;}
How to run yacc

• Flex name.l
• Yacc –d name.y
• cc lex.yy.c y.tab.c
• a.out
Lax
a program for validating strings accepted by a language {L=a^n|n>=1} {a,aa,aaa….}
S aS/null
%{
#include<stdio.h>  Generated by yacc, when u compile yacc
#include "y.tab.h" program the Yacc compiler crates 2 separate
%} files
%%  1 file that contains the c code and an other file it
[a] {return A;} contain definition of the tokens .
[\n] {return yytext[0];}  so,that definition of the token required in here
. { return 0;}
%%
yywrap()
{
return 1;
}
YACC

%{
#include<stdio.h>
%}
%token A
%%
start:S '\n'{return 0;}
S:A S
|;
%%
main()
{
printf("\n enter string");
if(yyparse()==0)
printf("\n valid");
}
yyerror()
{
printf("\n notaccepted");
}
END

You might also like