Professional Documents
Culture Documents
Unit-1
A computer understands instructions in machine code, i.e. in
the form of 0s and 1s.
Compilers
Assemblers
Interpreter
Compiler
The language processor that reads the complete source
program written in high level language as a whole in one go
and translates it into an equivalent program in machine
language is called as a Compiler.
Example: C, C++, C#, Java In a compiler, the source code
is translated to object code successfully if it is free of errors.
The compiler specifies the errors at the end of compilation
with line numbers when there are any errors in the source
code.
Loaders
Often a compiler, assembler or linker will produce code that
is not yet completely fixed and ready to execute but whose
principal memory reference are all made which can be
anywhere , such a code is said to be relocatable.
Debugger
It is a system that allows a programmer to look at programs
data while that program is running.
20
Introduction to Language Processor
The designer expresses the ideas in terms related to the
application domain.
Semantic Gap
Application Execution
Domain Domain 21
Introduction to Language Processor
The semantic gap has many difficulties, some of the
important ones being
large development time and efforts, and
22
Introduction to Language Processor
Now the semantic gap is bridged by the software
engineering steps.
Application PL Execution
Domain Domain Domain 23
Introduction to Language Processor
The specification gap is bridge by the software development
team.
24
Introduction to Language Processor
• A range of LP is defined to meet practical requirements.
25
Data Structure for Language Processing
Insert
Search
Delete
Algorithm for Generic Search
1. Predict entry ‘e’ in the search data structure at which
symbol symb is stored
Table Organization
Sequential Search Organization
Search organization
Table Organization
Free Entries
Physical Deletion
Source Target
Compiler
Program Program
Abstract-Syntax Tree: =
position +
initial *
rate 60
Front-end, Back-end division
Source IR Machine
code Front end Back end code
errors
1 Lexical analyzer
2 Syntax Analyzer
Analyses
3 Semantic Analyzer
Intermediate
Symbol-table 4 Code Generator Error Handler
Manager
5 Code Optimizer
Syntheses
6 Code Generator
7 Peephole Optimization
1, 2, 3, 4, 5 : Front-End
53 6, 7 : Back-End Target Program
Lexical Analysis
position id1
= assignment operator
initial id2
+ addition operator
rate id3
* multiplication operator
60 literal 1
Syntax Analysis
:=
identifier expression
+
Id1(position) expression expression
*
identifier expression expression
Id2(initial) identifier Literal
Id3(rate) 60
:=
identifier expression
+
Id1(position) expression expression
*
identifier expression expression
Id2(initial) identifier Literal
Id3(rate) (Int to
float)
60
60
Intermediate phase
Temp4= id3*60.0
Id1= id2+temp4
Code Generation
Mov id3,R1
Mul #60.0, R1
Mov id2, R2
Add R2, R1
Mov R1, id1
The Symbol Table
Tokens
The syntax tree
The symbol table
The literal table
The intermediate table
Temporary files
Bootstrapping
S T
H
A B B C A C
H H H
A B A B
H H K K
M
Cross compiler
Compiler source code retargeted to K
Result in Retargeted Compiler
BACK
Compiler-Construction Tools
Scanner generators
Parser generators
Syntax-directed translation engines
Automatic code generators
Data-flow engines
Analysis Tools
Structure editors
Pretty printers
Static checkers
Interpreters
Lexical Analysis
token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken
Symbol
table
Why to separate Lexical analysis and
parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Tokens, Patterns and Lexemes
if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=
E = M * C ** 2
<id, pointer to symbol table entry for E>
<assign-op>
<mult-op>
<exp-op>
e,.g fi (a == f(x)) …
E = M * C * 2
Drawback
Here we check for the if condition many times and then only the forward pointer is
incremented by 1.
Sentinels
Basic Symbols
empty-string:
Basic Operators
disjunction (OR, union): r | s
closure (repetition): r*
d1 -> r1
d2 -> r2
…
dn -> rn
Recognition of Tokens
One way to begin the design of any program is to describe
the behavior of the program by a flowchart.
State 1: C=getchar()
if letter(c) or Digit(c) then goto state 1
else if delimiter (c) goto state 2
else fail()
In state 2 we return to the parser a pair
consisting the integer code for an identifier
denoted by ID and a value that is pointer to
the symbol table returned by INSTALL.
State 2: retract ()
return( id, install)
E.g
Token Code Value
Begin 1
End 2
If 3
Then 4
Else 5
identifier 6 Pointer to symbol table
Constant 7 Pointer to symbol table
< 8 1
<= 8 2
= 8 3
<> 8 4
> 8 5
>= 8 6
Finite Automata
b
Transition Table
Start INPUT STMBOL
a B
0 {0,1} {0}
1 --- {2}
2 --- {3}
Deterministic Finite Automata
a = b + c * d;
ID ASSIGN ID PLUS ID MULT ID SEMI
112
Lex – Lexical Analyzer
Lexical analyzers tokenize input streams
Programming language
113
An Overview of Lex
114
Lex Source
Lex source is separated into three sections by %
% delimiters
The general format of Lex source is
{declarations}
%% (required)
{transition rules}
%% (optional)
{Auxiliary procedures}
The absolute minimum Lex program is thus
%% PLLab, NTHU,Cs2403 Programming
Languages 115
Translation rules are statements of the form
P1 {action 1}
P2 {action 2}
…….
…….
Pn {action n }
116
Lex Source Program
Lex source is a table of
regular expressions and
corresponding program fragments
digit [0-9]
letter [a-zA-Z]
%%
{letter}({letter}|{digit})* printf(“id: %s\n”, yytext);
\n printf(“new line\n”);
%%
main() {
yylex();
}
117
Lex Source to C Program
118
Lex v.s. Yacc
Lex
Lex generates C code for a lexical analyzer, or
scanner
Lex uses patterns that match strings in the input and
converts the strings to tokens
Yacc
Yacc generates C code for syntax analyzer, or parser.
Yacc uses grammar rules that allow it to analyze
tokens from Lex and create a syntax tree.
119
Implementation of lexical analyzer
Lex can build from its input, a lexical analyzer that behaves
roughly like a finite automation.
regular definitions
(none)
Translation rules
a {} /*actions are omitted here */
abb {}
a*b+ {}