You are on page 1of 35

2020

PROJECT::COMPILER CONSTRUCTION

[HTML SYNTEX ANALYZER]


UZAIR ASLAM :: ROLL NO : BCSM-F16-046
IFTIKHAR HASSAN :: ROLL NO : BCSM-F16-040
ASMA :: ROLL NO : BCSM-F16-214
Project title:
HTML syntax analyzer.

Scope of the Project

The project can be used for following purposes.

 Basic editor: user can write code of html in the program.


 Analyzer: the compiler will then analyze what does it means.
 Error finding: the compiler will then tell if there is any error it the program that user
have typed it can says it is string or it can return as it is expression that the user enter.
 Understandable: the error messages will be user friendly and not like the typical
difficult language the user more often don’t understand what they means.

Introduction
HTML syntax analyzer analyze the syntax for language and determine what does it means in
code that the user provide at run time. Lex tool is used for this purpose. The user can enter the
tags of the html such as the user can enter <title> than the compiler can understand it and give
its result that it is starting title tag or if the user enter </title> than the compiler analyze or
return the result that it is ending tag of title in short it can analyze the syntax of html and tells
us what does it means.

Regular Expression module:


Numbers  [0-9] [0-9]* titlend  {entagst}{title}{tagend}

String  [A-Za-z0-9" "]+ htmls {tagstart}{html}{tagend}

Special chr  [+-,@#$%^&*)(!]+ htmlend {entagst}{html}{tagend}

tagstart  "<" bodys {tagstart}{body}{tagend}

tagend  ">" bodye {entagst}{body}{tagend}

endingtagst  "</" hes {tagstart}{heading}{tagend}

str {spchr} paras  {tagstart}[pP]{tagend}

titles  {tagstart}{title}{tagend} parae {entagst}[pP]{tagend}


DFA modules:
Lex Code:

%{
#include<stdio.h>
#include<conio.h>
%}
doctag "<!DOCTYPE html>"
html "html"
body "body"
heading [h][0-9]
alpabets [a-zA-Z][a-zA-Z]*
numbers [0-9][0-9]*
spchr [+-,@#$%^&*)(!]+
string [A-Za-z0-9" "]+
tagstart "<"
tagend ">"
entagst "</"
str {spchr}
title "title"
titles {tagstart}{title}{tagend}
titlend {entagst}{title}{tagend}
htmls {tagstart}{html}{tagend}
htmlend {entagst}{html}{tagend}
bodys {tagstart}{body}{tagend}
bodye {entagst}{body}{tagend}
hes {tagstart}{heading}{tagend}
hede {entagst}{heading}{tagend}
head "head"
heads {tagstart}{head}{tagend}
heade {entagst}{head}{tagend}

paras {tagstart}[pP]{tagend}
parae {entagst}[pP]{tagend}
%%
{doctag} {printf("DOCTYPE tag start: \n");}
{htmls} {printf("html tag start: \n");}
{htmlend} {printf("html tag end ");}
{bodys} {printf("body tag start: \n");}
{bodye} {printf("body tag end ");}
{hes} {printf("heading tag start: \n");}
{hede} {printf(" heading tag end ");}
{string} {printf(" string ");}
{str} {printf(" special character ");}
{titles} {printf(" title tag start: ");}
{titlend} {printf(" title tag end ");}
{heads} {printf(" head tag start: ");}
{heade} {printf(" head tag end ");}
{paras} {printf(" paragragh tag start: ");}
{parae} {printf(" paragragh tag end ");}
%%
int yywrap()
{
return 1;
}

int main()
{
printf("enter :",yytext);
yylex();
return 0;
}
CFG MODULES

Numbers:
S -> AS|B|NULL
A -> 0|1|2|….|9
B -> 0|1|2|….|9|NULL

String:
S -> A|B|C|D
A -> A|B|C|…|Z
B ->a|b|c|..|z
C -> 0|1|2|….|9
D -> “ ”

Special chr:

S -> A|B|C|D|E|F|G|H|I|J|K|L

A -> +

B -> -
C -> @
D -> #
E-> $
F -> %
G -> ^
H -> &
I -> !=
J -> (
K -> )
L -> !
Tagstart:
S -> B
B-> <
Tagend :
S -> B
B-> >
Endingtagst:
S -> B
B-> </
Title:
S-> AS|BS|C
A-> tagstart
B->title
C->tagend

Titlend:
S-> AS|BS|C
A-> entagst
B->title
C->tagend

htmls:
S-> AS|BS|C
A-> tagstart
B->html
C->tagend

htmlend:
S-> AS|BS|C
A-> entagst
B->html
C->tagend

bodys:
S-> AS|BS|C
A-> tagstart
B->body
C->tagend

bodye:
S-> AS|BS|C
A-> entagst
B->body
C->tagend
hes:
S-> AS|BS|C
A-> tagstart
B->heading
C->tagend

paras:
S-> AS|BS|C
A-> tagstart
B->pP
C->tagend

parae:
S-> AS|BS|C
A-> entagst
B->pP
C->tagend

FIRST & FOLLOW


NUMBERS:
CFG FIRST FOLLOW
S -> AS|B|NULL {0|1|2|….|9|0|1|2|….|9|NULL} $
A -> 0|1|2|….|9 {0|1|2|….|9} {0|1|2|….|9|0|1|2|….|9|NULL}
B -> 0|1|2|….|9|NULL {0|1|2|….|9|NULL} $

STRINGS:

CFG FIRST FOLLOW


S -> A|B|C|D {A|B|C|…|Z|a|b|c|..|z|0|1|2|….|9|“”} $
A -> A|B|C|…|Z { A|B|C|…|Z } $
B ->a|b|c|..|z { a|b|c|..|z } $
C -> 0|1|2|….|9 {0|1|2|….|9} $
D -> “ ” {“ ”} $
Special chr:

CFG FIRST FOLLOW


S >A|B|C|D|E|F|G|H|I|J|K|L {+-,@#$%^&*)(!} $
A -> + {+} $
B -> - {-} $
C -> @ {@} $
D -> # {#} $
E-> $ {$} $
F -> % {%} $
G -> ^ {^} $
H -> & {&} $
I -> != {!=} $
J -> ( {(} $
K -> ) {)} $
L -> ! {!} $

Tagstart:
CFG FIRST FOLLOW
S -> B {<} $
B-> < {<} $

TagEND:
CFG FIRST FOLLOW
S -> B {>} $
B-> > {>} $

Endingtagst:
CFG FIRST FOLLOW
S -> B {</} $
B-> < {</} $
Title
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|title|tagend } $
A-> tagstart { tagstart } { tagstart|title|tagend }
B->title { title} { tagstart|title|tagend }
C->tagend {tagend} $

Titlend
CFG FIRST FOLLOW
S-> AS|BS|C { entagst|title|tagend } $
A-> entagst { entagst } { entagst|title|tagend }
B->title { title} { entagst|title|tagend }
C->tagend {tagend} $

htmls
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|html|tagend } $
A-> tagstart { tagstart } { tagstart|html|tagend }
B->html { html} { tagstart|html|tagend }
C->tagend {tagend} $

htmlend
CFG FIRST FOLLOW
S-> AS|BS|C { entagst |html|tagend } $
A-> entagst { entagst } { entagst |html|tagend }
B->html {html} { entagst |html|tagend }
C->tagend {tagend} $
bodys
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|body|tagend } $
A-> tagstart { tagstart } { tagstart|body|tagend }
B->body { body} { tagstart|body|tagend }
C->tagend {tagend} $

bodye
CFG FIRST FOLLOW
S-> AS|BS|C { entagst |body|tagend } $
A-> entagst { entagst } { entagst |body|tagend }
B->body {body} { entagst |body|tagend }
C->tagend {tagend} $

hes
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|heading|tagend } $
A-> tagstart { tagstart } { tagstart|heading|tagend }
B->heading {heading} { tagstart|heading|tagend }
C->tagend {tagend} $

paras
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|pP|tagend } $
A-> tagstart { tagstart } { tagstart|pP|tagend }
B->pP {pP} { tagstart|pP|tagend }
C->tagend {tagend} $
parae
CFG FIRST FOLLOW
S-> AS|BS|C { entagst |pP|tagend } $
A-> entagst { entagst } { entagst |pP|tagend }
B->pP {pP} { entagst |pP|tagend }
C->tagend {tagend} $

Parse table:
special chr:
Stack and parse tree
Numbers:
String:

Special chr:
Tagstart:

Tagend:
Entagst:
Semantic Analysis:
bodys:
Numbers:
Bodye:
Htmle:
htmls
Results and conclusions
Conclusion
It conclude that this compiler can accurately identify the syntax of the HTML and tell the user
that the what is the syntax means for example <head> the compiler analyze it and tells that it is
the starting head tag .

Results
Refrences
http://ijirt.org/master/publishedpaper/IJIRT100158_PAPER.pdf

https://www.academia.edu/Documents/in/Compilers

https://www.researchgate.net/publication/338104054_Paper_on_Symbol_Table_Implementation_in_C
ompiler_Design-_DrJad_Matta

https://www.semanticscholar.org/paper/Learning-Compiler-Design-as-a-Research-Activity-Moreno-
Seco-Forcada/39d00fe0af3ffcb8b72a291f1fac9659ce059910#related-papers

You might also like