HTML Syntex Analyzer: Project::Compiler Construction

2020
PROJECT::COMPILER CONSTRUCTION
[HTML SYNTEX ANALYZER]

UZAIR ASLAM :: ROLL NO : BCSM-F16-046
IFTIKHAR HASSAN :: ROLL NO : BCSM-F16-040
ASMA :: ROLL NO : BCSM-F16-214
Project title:
HTML syntax analyzer.
Scope of the Project
The project can be used for following purposes.
 Basic editor: user can write code of html in the program.

 Analyzer: the compiler will then analyze what does it means.
 Error finding: the compiler will then tell if there is any error it the program that user
have typed it can says it is string or it can return as it is expression that the user enter.
 Understandable: the error messages will be user friendly and not like the typical
difficult language the user more often don’t understand what they means.
Introduction
HTML syntax analyzer analyze the syntax for language and determine what does it means in
code that the user provide at run time. Lex tool is used for this purpose. The user can enter the
tags of the html such as the user can enter <title> than the compiler can understand it and give
its result that it is starting title tag or if the user enter </title> than the compiler analyze or
return the result that it is ending tag of title in short it can analyze the syntax of html and tells
us what does it means.
Regular Expression module:

Numbers  [0-9] [0-9]* titlend  {entagst}{title}{tagend}
String  [A-Za-z0-9" "]+ htmls {tagstart}{html}{tagend}
Special chr  [+-,@#$%^&*)(!]+ htmlend {entagst}{html}{tagend}
tagstart  "<" bodys {tagstart}{body}{tagend}
tagend  ">" bodye {entagst}{body}{tagend}
endingtagst  "</" hes {tagstart}{heading}{tagend}
str {spchr} paras  {tagstart}[pP]{tagend}
titles  {tagstart}{title}{tagend} parae {entagst}[pP]{tagend}

DFA modules:
Lex Code:
%{
#include<stdio.h>
#include<conio.h>
%}
doctag "<!DOCTYPE html>"
html "html"
body "body"
heading [h][0-9]
alpabets [a-zA-Z][a-zA-Z]*
numbers [0-9][0-9]*
spchr [+-,@#$%^&*)(!]+
string [A-Za-z0-9" "]+
tagstart "<"
tagend ">"
entagst "</"
str {spchr}
title "title"
titles {tagstart}{title}{tagend}
titlend {entagst}{title}{tagend}
htmls {tagstart}{html}{tagend}
htmlend {entagst}{html}{tagend}
bodys {tagstart}{body}{tagend}
bodye {entagst}{body}{tagend}
hes {tagstart}{heading}{tagend}
hede {entagst}{heading}{tagend}
head "head"
heads {tagstart}{head}{tagend}
heade {entagst}{head}{tagend}
paras {tagstart}[pP]{tagend}
parae {entagst}[pP]{tagend}
%%
{doctag} {printf("DOCTYPE tag start: \n");}
{htmls} {printf("html tag start: \n");}
{htmlend} {printf("html tag end ");}
{bodys} {printf("body tag start: \n");}
{bodye} {printf("body tag end ");}
{hes} {printf("heading tag start: \n");}
{hede} {printf(" heading tag end ");}
{string} {printf(" string ");}
{str} {printf(" special character ");}
{titles} {printf(" title tag start: ");}
{titlend} {printf(" title tag end ");}
{heads} {printf(" head tag start: ");}
{heade} {printf(" head tag end ");}
{paras} {printf(" paragragh tag start: ");}
{parae} {printf(" paragragh tag end ");}
%%
int yywrap()
{
return 1;
}
int main()
{
printf("enter :",yytext);
yylex();
return 0;
}
CFG MODULES
Numbers:
S -> AS|B|NULL
A -> 0|1|2|….|9
B -> 0|1|2|….|9|NULL
String:
S -> A|B|C|D
A -> A|B|C|…|Z
B ->a|b|c|..|z
C -> 0|1|2|….|9
D -> “ ”
Special chr:
S -> A|B|C|D|E|F|G|H|I|J|K|L
A -> +
B -> -
C -> @
D -> #
E-> $
F -> %
G -> ^
H -> &
I -> !=
J -> (
K -> )
L -> !
Tagstart:
S -> B
B-> <
Tagend :
S -> B
B-> >
Endingtagst:
S -> B
B-> </
Title:
S-> AS|BS|C
A-> tagstart
B->title
C->tagend
Titlend:
S-> AS|BS|C
A-> entagst
B->title
C->tagend
htmls:
S-> AS|BS|C
A-> tagstart
B->html
C->tagend
htmlend:
S-> AS|BS|C
A-> entagst
B->html
C->tagend
bodys:
S-> AS|BS|C
A-> tagstart
B->body
C->tagend
bodye:
S-> AS|BS|C
A-> entagst
B->body
C->tagend
hes:
S-> AS|BS|C
A-> tagstart
B->heading
C->tagend
paras:
S-> AS|BS|C
A-> tagstart
B->pP
C->tagend
parae:
S-> AS|BS|C
A-> entagst
B->pP
C->tagend
FIRST & FOLLOW

NUMBERS:
CFG FIRST FOLLOW
S -> AS|B|NULL {0|1|2|….|9|0|1|2|….|9|NULL} $
A -> 0|1|2|….|9 {0|1|2|….|9} {0|1|2|….|9|0|1|2|….|9|NULL}
B -> 0|1|2|….|9|NULL {0|1|2|….|9|NULL} $
STRINGS:
CFG FIRST FOLLOW

S -> A|B|C|D {A|B|C|…|Z|a|b|c|..|z|0|1|2|….|9|“”} $
A -> A|B|C|…|Z { A|B|C|…|Z } $
B ->a|b|c|..|z { a|b|c|..|z } $
C -> 0|1|2|….|9 {0|1|2|….|9} $
D -> “ ” {“ ”} $
Special chr:
CFG FIRST FOLLOW

S >A|B|C|D|E|F|G|H|I|J|K|L {+-,@#$%^&*)(!} $
A -> + {+} $
B -> - {-} $
C -> @ {@} $
D -> # {#} $
E-> $ {$} $
F -> % {%} $
G -> ^ {^} $
H -> & {&} $
I -> != {!=} $
J -> ( {(} $
K -> ) {)} $
L -> ! {!} $
Tagstart:
CFG FIRST FOLLOW
S -> B {<} $
B-> < {<} $
TagEND:
CFG FIRST FOLLOW
S -> B {>} $
B-> > {>} $
Endingtagst:
CFG FIRST FOLLOW
S -> B {</} $
B-> < {</} $
Title
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|title|tagend } $
A-> tagstart { tagstart } { tagstart|title|tagend }
B->title { title} { tagstart|title|tagend }
C->tagend {tagend} $
Titlend
CFG FIRST FOLLOW
S-> AS|BS|C { entagst|title|tagend } $
A-> entagst { entagst } { entagst|title|tagend }
B->title { title} { entagst|title|tagend }
htmls
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|html|tagend } $
A-> tagstart { tagstart } { tagstart|html|tagend }
B->html { html} { tagstart|html|tagend }
htmlend
CFG FIRST FOLLOW
S-> AS|BS|C { entagst |html|tagend } $
A-> entagst { entagst } { entagst |html|tagend }
B->html {html} { entagst |html|tagend }
bodys
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|body|tagend } $
A-> tagstart { tagstart } { tagstart|body|tagend }
B->body { body} { tagstart|body|tagend }
bodye
CFG FIRST FOLLOW
S-> AS|BS|C { entagst |body|tagend } $
A-> entagst { entagst } { entagst |body|tagend }
B->body {body} { entagst |body|tagend }
hes
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|heading|tagend } $
A-> tagstart { tagstart } { tagstart|heading|tagend }
B->heading {heading} { tagstart|heading|tagend }
paras
CFG FIRST FOLLOW
S-> AS|BS|C { tagstart|pP|tagend } $
A-> tagstart { tagstart } { tagstart|pP|tagend }
B->pP {pP} { tagstart|pP|tagend }
parae
CFG FIRST FOLLOW
S-> AS|BS|C { entagst |pP|tagend } $
A-> entagst { entagst } { entagst |pP|tagend }
B->pP {pP} { entagst |pP|tagend }
Parse table:
special chr:
Stack and parse tree
Numbers:
String:
Special chr:
Tagstart:
Tagend:
Entagst:
Semantic Analysis:
bodys:
Numbers:
Bodye:
Htmle:
htmls
Results and conclusions
Conclusion
It conclude that this compiler can accurately identify the syntax of the HTML and tell the user
that the what is the syntax means for example <head> the compiler analyze it and tells that it is
the starting head tag .
Results
Refrences
http://ijirt.org/master/publishedpaper/IJIRT100158_PAPER.pdf
https://www.academia.edu/Documents/in/Compilers
https://www.researchgate.net/publication/338104054_Paper_on_Symbol_Table_Implementation_in_C
ompiler_Design-_DrJad_Matta
https://www.semanticscholar.org/paper/Learning-Compiler-Design-as-a-Research-Activity-Moreno-
Seco-Forcada/39d00fe0af3ffcb8b72a291f1fac9659ce059910#related-papers

HTML Syntex Analyzer: Project::Compiler Construction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HTML Syntex Analyzer: Project::Compiler Construction

Uploaded by

Copyright:

Available Formats

2020

[HTML SYNTEX ANALYZER]

Scope of the Project

The project can be used for following purposes.

 Basic editor: user can write code of html in the program.

Regular Expression module:

String  [A-Za-z0-9" "]+ htmls {tagstart}{html}{tagend}

Special chr  [+-,@#$%^&*)(!]+ htmlend {entagst}{html}{tagend}

tagstart  "<" bodys {tagstart}{body}{tagend}

tagend  ">" bodye {entagst}{body}{tagend}

endingtagst  "</" hes {tagstart}{heading}{tagend}

str {spchr} paras  {tagstart}[pP]{tagend}

titles  {tagstart}{title}{tagend} parae {entagst}[pP]{tagend}

FIRST & FOLLOW

CFG FIRST FOLLOW

CFG FIRST FOLLOW

You might also like