You are on page 1of 16

Lecture# 02

Compiler Construction

Topics
• The Analysis Task for Compilation (Lexical, Hierarchical & Semantic Analysis)
• Supporting Phases (Symbol Table & Error Handler), The Synthesis task, Compilation process
• Loaders & Link Editors, Grouping of phases, Compiler Construction Tools
The Analysis Task for Compilation
• Three Phases:
– Linear / Lexical Analysis:
• L-to-R Scan to Identify Tokens
token: sequence of chars having a collective meaning
– Hierarchical Analysis:
• Grouping of Tokens Into Meaningful Collection

– Semantic Analysis:
• Checking to ensure Correctness of Components

2
2
Phase 1. Lexical Analysis
Easiest Analysis - Identify tokens which are the
basic building blocks

For
Example:
Position := initial + rate *
_ _ _
60 ;

All are tokens

Blanks, Line breaks, etc. are


scanned out

3
3
Phase 2. Hierarchical Analysis
Parsing or Syntax Analysis For previous example,
we would have
assignment Parse Tree:
statement
:=
identifier expression
+
position expression expression
*
identifier expression expression
initial identifier number
rate 60

Nodes of tree are constructed using a grammar for the language


What is a Grammar?
• Grammar a Set of Rules Which Govern the
Interdependencies
is & Structure Among the Tokens

statement is an assignment statement, or


while statement, or
if statement, or ...

assignment statement is an identifier := expression ;

expression is an (expression), or
expression + expression, or
expression * expression,
or number, or
identifier, or ...
Why Have We Divided Analysis in This Manner?
Lexical Analysis
• Scans Input, Its Linear Actions Are Not Recursive
• Identify Only Individual “words” that are the the Tokens of the
Language

Recursion Is Required to Identify Structure of an


Expression As Indicated in Parse Tree
• Verify that the “words” are Correctly Assembled into
“sentences”

What is Third Phase?


• Determine Whether the Sentences have
One and Only One
Unambiguous Interpretation
• … and do something about it!
Phase 3. Semantic Analysis
• Find More Complicated Semantic Errors and
Support Code Generation
• Parse Tree Is Augmented With Semantic Actions

:= :=
position + position +

initial * initial *

rate 60 rate
inttoreal

60
Compressed Tree
Conversion Action
Phase 3. Semantic Analysis
• Most Important Activity in This Phase:
• Type Checking - Legality of Operands
• Many Different Situations:
Real := int + char ;
A[int] := A[real] + int ;
while char <> int
do
…. Etc.
Supporting Phases/Activities for Analysis
• Symbol Table Creation / Maintenance
– Contains Info (storage, type, scope, args) on
Each “Meaningful” Token, Typically Identifiers
– Data Structure Created / Initialized During Lexical Analysis
– Utilized / Updated During Later Analysis & Synthesis

• Error Handling
– Detection of Different Errors Which Correspond
to All Phases
– What Kinds of Errors Are Found the Analysis
During Phase?
– What Happens When an Error Is Found?
The Synthesis Task For Compilation
• Intermediate Code Generation
– Abstract Machine Version of Code - Independent of
Architecture
• Easy to Produce and
• Easy to translate into target program
• Code Optimization
– Find More Efficient Ways to Execute Code
– Replace Code With More Optimal Statements
• Final Code Generation
– Generate Relocatable Machine Dependent Code

10
Reviewing the Entire Process
position := initial + rate * 60

lexical analyzer
id1 := id2 + id3 * 60
syntax analyzer
:=
id1 +
id2 *
id3 60
semantic analyzer
:=
Symbol + E
Table
id1
id2 * r

position .... id3 inttoreal r


initial …. 60
intermediate code generator o
rate….
r

s 11
Reviewing Entire Process
(Intermediate Form is three address Code)
position := initial + rate * 60
3 address code (TAC)
or (3AC) is an Symbol Table E
intermediate code r
position ....
used by optimizing initial …. r
compilers to aid in the intermediate code generator
rate….
implementation of temp1 := inttoreal(60) o
code improving temp2 := id3 * temp1
r
transformation. Each temp3 := id2 + temp2 3 address code
TAC instruction has at id1 := temp3 s
most three operands code optimizer
and is typically a temp1 := id3 * 60.0
combination of id1 := id2 + temp1
assignment and a final code generator
binary operator. MOVF id3, R2 Assembly code
MULF #60.0, R2
MOVF id2, R1 OR
ADDF R2, R1
Machine code
13
MOVF R1, id1
Loaders and Link-Editors
• Assembler:
– Takes lower level computer program and
translates into computer instructions understandable by
computers

• Loader: taking relocatable machine code


– altering the addresses and
– placing the altered instructions into memory.
• Link-editor: taking many (relocatable) machine
code
programs (with cross-references) and produce a single file.

– Need to keep track of correspondence between variable


names and corresponding addresses in each piece of code.

13
The Grouping of Phases
Front End : Analysis + Intermediate Code Generation
vs.
Back End : Code Generation + Optimization

Number of Passes:
A pass: requires r/w intermediate files

Fewer passes: more efficiency.


However: fewer passes require more
sophisticated memory management and compiler
phase interaction.
Tradeoffs ……..

14
Compiler Construction Tools
Scanner Generators:
Produce Lexical Analyzers Parser Generators:
Produce Syntax Analyzers
Syntax-directed Translation Engines:
Generate Intermediate Code Automatic Code Generators:
Generate Actual Code
Data-Flow Engines:
Support Optimization

15
The End

17

You might also like