Professional Documents
Culture Documents
Day School – 1
by
Gehan Anthonys
Today’s lecture …
• General Information
• Couse Aim and Learning Outcomes
• Course Syllabus
• Introduction
- Why study Compilers?
- Quick history of Compilers
- Some applications
1
• General Information
Academic/ Course Coordinator: Gehan Anthonys
Email: ganth@ou.ac.lk
Postal Address:
Course Coordinator – EEX6335 or EEX6363
Department of Electrical & Computer Engineering
The Open University of Sri Lanka
P.O. Box 21, Nawala
Nugegoda.
• You can visit the page of “FET2021 Faculty Class for Engineering
Technology Students” at the LearnOUSL.
2
Design Project:
Apply theory of computation to design a compiler
for a target application.
Very Important:
# Late submissions will not be accepted;
# DO NOT send the answer scripts for TMAs and DP via email;
# Viva for each TMA and DP are compulsory;
# Clear and brief answers without unnecessary details will gain
maximum marks.
Hourly Breakdown
3DP-RT*8h = 24h
3
• Couse Aim and Learning Outcomes
4
Activity Mapping
Recommended Reading:
• Course Syllabus
5
Do you have any questions/clarifications?
… few minutes.
• Introduction
6
A brief history:
• First, there was nothing; then, there was machine code; next,
there were assembly languages.
• 1954 – IBM develops the 704 (software cost > hardware cost)
• 1970s ~ 1990s – Some tools: Lex & Yacc, Flex & Bison, etc
• Program Translations
- Binary Translation (in x86 code), Compiled Simulation, Hardware
Synthesis (Verilog or VHDL), Database Query Interpreters (SQL)
7
• Software Languages
Comparison:
Interpreter Compiler
- translates just one statement of the - scans the entire program and translates
program at a time into machine code. the whole of it into machine code at once.
- takes very less time to analyze the - takes a lot of time to analyze the source
source code. However, the overall time code. However, the overall time taken to
to execute the process is much slower. execute the process is much faster.
- keeps translating the program - generates the error message only after it
continuously till the first error is scans the complete program and hence
confronted. If any error is spotted, it debugging is relatively harder while
stops working and hence debugging working with a compiler.
becomes easy.
8
The basic steps in a Software language processing :
Example:
c = next();
if (c == ‘\\’) {
c = next();
if (c == ‘n’)
return(‘\n’);
}
9
c = next();
if (c == ‘\\’) {
c = next();
if (c == ‘n’)
return(10);
}
Program
High Level Language
10
• Anatomy of a Compiler
Program (character stream)
Machine
independent
Lexical Analyzer (Scanner)
Token Stream Front
end
Syntax Analyzer (Parser)
Parse Tree
Semantic Analyzer
Error handling
Verified Parse Tree
Symbol
table Intermediate Representation Code Generator
Intermediate Representation
Lex/Yacc can generate program fragments that solve the tasks: reading of the source
program and discovering its structure. There are more tools …
• Lex - A Lexical Analyzer Generator by M. E. Lesk and E. Schmidt, July 21, 1975
is a program generator designed for lexical processing of character input streams.
(i.e., splitting the source file into tokens). Lex is good at pattern matching.
• Bison, The Yacc-compatible parser generator by Charles Donnelly and Richard Stallman
is a general-purpose parser generator that converts a grammar description for an
LALR(1) context-free grammar into a C program to parse that grammar.
11
A basic example:
Note that:
‘+=’ is a single token, NOT two
separate tokens, because ...
12
• Once the structure of the sentence is clear we need to
understand the meaning which is called Semantic analysis:
Humans can manage quite well this activity,
the same is not so true for machines.
13
• It is now the most complex and effort prone activity in the
construction of modern compilers, i.e., IR Optimization.
Note :
The two values, a and b
are constants and have
not changed during the
loop, thus, …
14
• Code optimization faces many undecidable problems, so theory
alone is not enough and we need heuristics, good engineers,
and good programmers.
As a summary . . .
• Lexical analysis (Scanning):
Identify logical pieces of the description.
• Syntax analysis (Parsing):
Identify how those pieces relate to each other.
• Semantic analysis:
Identify the meaning of the overall structure.
• IR Generation:
Design one possible structure.
• IR Optimization:
Simplify the intended structure.
• Code Generation:
Fabricate the structure.
• Optimization:
Improve the resulting structure.
15
Do you have any questions/clarifications?
… few minutes.
• Lexical Analysis
- Strings and Languages
- Regular Expressions
- Finite Automata
• Nondeterministic FA
• Deterministic FA
- Transition table
16
• Lexical Analysis
Also called scanning, take input program string and convert into tokens
17
In the Lexical stage:
18
Consider the following examples:
Q&As
Q1: If an alphabet consists of symbols {a, b, c} then construct
the strings s of the alphabet such that |s| = 2.
Answer:
Since no rules for forming the strings are given so we could
take any arbitrary strings of length 2. Thus, the answer is
ab, bc, ac, aa, bb, cc, …
Q2: For the same alphabet what are the strings s that satisfy the
condition of |s| = 1.
Answer:
a, b, c
Q3: The set of strings of length 0 over the alphabet ∑ = {a, b, c},
then language is given by
L = { s | s ∑* and |s| = 0}
∴ = {ε} ≠ ϕ
19
( Kleene star )
Examples:
Consider an alphabet { 0, 1 }, then the RE for
• All strings that represent binary numbers divisible by four
(with accept 0) ((0|1)*00)|0
• All strings that do not contain “01” as a substring 1*0*
REs: Properties
• Basis symbols:
• is a regular expression denoting language L() = {}
• a is a regular expression denoting L(a) = {a}
20
Algebraic laws for REs
Law Description
r|s=s|r alternation is commutative
r|(s|t)=(r|s)|t alternation is associative
r(st) = (rs)t concatenation is associative
r(s|t) = rs | rt concatenation distributes over
(s|t)r = sr | tr alternation
εr = rε = r ε is the identity for concatenation
r* = ( r |ε)* ε is guaranteed in a closure
r** = r* * is idempotent
E.g.,
• digit [0-9]
• num digit+ (. digit+)? ( E (+-)? digit+ )?
21
Regular Languages:
• The languages that can be described using regular expressions
are the regular languages or regular sets, i.e., if it can be
expressed in terms of regular expression.
e.g., { ambn : m and n are positive integers }
• The set of regular languages: each element is a regular language.
Examples:
Give the English descriptions of the following languages:
• ((0|1)(0|1)(0|1)(0|1)(0|1))+ All strings with length a multiple of 5
• (01)*|(10)*|(01)*0|(10)*1 All alternating binary strings
• (00|0000)* Strings with an even number of 0’s
• (000|00|1)* Any string of 0's and 1's with no single 0’s
In Lexical Analyzer
• Initially, the Lexer undergo the scanning process, during this
process the lexer will read the source code character by character.
Scanning process in Lexer
22
Note:
REs, NFAs, and DFAs
accept the same languages!
23
E.g., A finite automaton components to construct an automaton that
recognizes (abc+)+
How FA works:
• Machine starts in start or initial state;
• Repeat until the end of the string is reached:
- Scan the next symbol s of the string
- Take transition edge labeled with s
• String is accepted if automaton is in final state when
end of string reached.
N = { Q, ∑, δ, q0, F },
where Q = Finite set of states;
∑ = Finite alphabet;
δ = Transition function from Q x ∑ to 2Q;
q0 = Initial/start state;
F = final/accepting state.
Note:
Transitions can be labeled with ε, meaning states can be
reached without reading any input, i.e., δ = Q x ∑ υ {ε} to 2Q.
24
• is an algorithm invented
by Ken Thompson in 1968
to translate regular
expressions into an NFA.
25
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •51
E.g., Set of all binary strings that are divisible by four (include
0 in this set) can be defined by the regular expression:
((0|1)*00) | 0
Apply Thompson’s Rules to create an NFA.
Solution:
26
((0|1)*00) | 0
((0|1)*00) | 0
We have,
27
Deterministic Finite Automaton (DFA)
D = { Q, ∑, δ, q0, F },
where Q = Finite set of states; ∑ = Finite alphabet;
δ = Transition function from Q x ∑ to Q;
q0 = Initial/start state; F = final/accepting state.
• An NFA
- can have zero, one or more than one move from a given state
on a given input symbol.
- can also have NULL moves (moves without input symbol).
• DFA
- has one and only one move from a given state on a given input
symbol.
28
E.g., Consider the following NFA and draw its DFA:
Solution:
From the figure, we have, the NFA as { Q, ∑, δ, q0, F }, where
δ= F = {q2}.
Step 1: Initially Q’ = ɸ,
Step 2: Q’ = {q0},
Step 3: For each state in Q’, find the states for each input symbol.
29
• Now, moves from state { q0, q1 } on different input symbols are not present
in transition table of DFA, we will calculate it like:
• Now { q0, q2 } will be considered as a single state. As its entry is not in Q’,
add it to Q’. So,
Q’ = { q0, { q0, q1 }, { q0, q2 } }
• Then, moves from state {q0, q2} on different input symbols are not present
in transition table of DFA, we will calculate as:
• As there is no new state generated, the final state of DFA will be state
which has q2 as its component, i.e., { q0, q2 }
30
Practice:
Convert the following REs to an NFA and then to a DFA
• (0|1)*11|0*
• Strings of alternating 0 and 1
• aba*|(ba|b)
31
((0|1)*00) | 0
((0|1)*00) | 0
32
((0|1)*00) | 0
((0|1)*00) | 0
33
((0|1)*00) | 0
q0
34
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •69
35
Minimization of DFA using Equivalence theorem
Before apply this method need to eliminate all the dead states and
inaccessible states from the given DFA (if any):
• Dead State
All those non-final states
which transit to itself for all
input symbols in Σ are
called as dead states.
• Inaccessible State
All those states which can
never be reached from the
initial state are called as
inaccessible states.
State 0 1
A B D
B C D
C C D
D E D
E C D
36
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •73
37
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •75
38
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •77
Practice:
Apply the equivalence theorem to minimize the following DFAs:
Problem 1:
After eliminating inaccessible states
39
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •79
Implementing DFAs
• A DFA can be implemented by a 2D table with “states” and “input
symbols”. This is called a Transition table;
• For every transition δ define the states in the table;
• DFA execution is performed –
for the given state and input, read the corresponding state in
the table and skip to that state;
• Very efficient method;
E.g.,
40
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •81
… few minutes.
41