You are on page 1of 41

EEX6335 – Compiler Design

EEX6363 – Compiler Construction

Day School – 1

by
Gehan Anthonys

Bachelor of Technology Honours in Engineering


Bachelor of Software Engineering Honours

Department of Electrical and Computer Engineering


Faculty of Engineering
The Open University of Sri Lanka
31 January 2021

Today’s lecture …
• General Information
• Couse Aim and Learning Outcomes
• Course Syllabus
• Introduction
- Why study Compilers?
- Quick history of Compilers
- Some applications

• The Anatomy of a Compiler


- with a basic example.

• Detail design of the first phase of a compiler:


Lexical Analyzer
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 2

1
• General Information
Academic/ Course Coordinator: Gehan Anthonys

Email: ganth@ou.ac.lk

Phone: 2881272 (the department)

Postal Address:
Course Coordinator – EEX6335 or EEX6363
Department of Electrical & Computer Engineering
The Open University of Sri Lanka
P.O. Box 21, Nawala
Nugegoda.

Moodle: course page can be accessed via ’LearnOUSL’ at


https://learnousl.ou.ac.lk/

• You can visit the page of “FET2021 Faculty Class for Engineering
Technology Students” at the LearnOUSL.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 3

Course materials: Book 1, Book 2 & Supplementary

Continuous Assessment activities: six


• Tutor Marked Assignments (TMAs): 03 (with Viva)
-- TMAs are based on the Design Project --

• Continuous Assessment Tests (CATs): 02


CAT #1 is based on Book1
Closed book type
CAT #2 is based on Book1 & 2
• Design Project (DP): 01 (with Viva)

CA mark: [CA] = [Average of the best 5 out of 6 activities]

Eligibility criteria: If [CA] >= 40% AND [DP] >= 40%

LearnOUSL: Upload your answer scripts through the Moodle


LMS at https://learnousl.ou.ac.lk/ by due dates.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •4

2
Design Project:
Apply theory of computation to design a compiler
for a target application.

• Implement lexical analyzer, syntax analyzer, semantic


analyzer and code generator for the target application;

• Evaluate grammar based on the target application to


verify the regular expression of the grammar.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •5

Very Important:
# Late submissions will not be accepted;
# DO NOT send the answer scripts for TMAs and DP via email;
# Viva for each TMA and DP are compulsory;
# Clear and brief answers without unnecessary details will gain
maximum marks.

Final Examination – Closed book, essay type questions (Three hours)

Hourly Breakdown

Theory Activity Hours Independent Assessments Total


Learning Hours
16 SSS * 2 h 5 DS * 2h 1DP * 30h = 30h 16 SSS * 3 h = 48 h 2 CAT*1.25h = 2.5 h 130h
= 32 h = 10 h
1 ONLS* 5 h = 5 h 3 TMA*2h = 6 h

3DP-RT*8h = 24h

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •6

3
• Couse Aim and Learning Outcomes

To provide required knowledge and skills


to develop compiler for real applications.

Programme Learning Outcomes (PLO):

• PLO1: Apply knowledge of mathematics, basic sciences and engineering


fundamentals to the analysis of complex engineering problems.
• PLO3: Design systems, components or processes that meet specified needs.
• PLO5: Create, select and apply appropriate techniques, resources, and
modern engineering and IT tools to complex engineering activities.
• PLO10: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as being able to
comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.
• PLO12: Engage in independent and lifelong learning in the broad context of
technological change.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 7

Course Learning Outcomes (CLO):

At the completion of this course student will be able to :


• CLO1: Develop grammar for a compiler by analyzing a target application.
[PLO1] [PLO3] [PLO5] [PLO10] [PLO12]
• CLO2: Apply the principles of theory of computation to develop the compiler
based on the grammar developed. [PLO1] [PLO3] [PLO5] [PLO10] [PLO12]
• CLO3: Create scanner, parser and code generator of the compiler using
LEX and YACC tools. [PLO1] [PLO3] [PLO5] [PLO10] [PLO12]
• CLO4: Construct the compiler of the target application. [PLO1] [PLO3]
[PLO5] [PLO10] [PLO12]
• CLO5: Validate the constructed compiler with a selected set of samples for
the target application. [PLO1] [PLO5] [PLO10] [PLO12]

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 8

4
Activity Mapping

Activity CLO1 CLO2 CLO3 CLO4 CLO5

Self-instructional material S1, S5, S6 S2-S4 S15-S16 S5-S16 S10-S14

Tutor Marked Assignment 1 √ √ √


Tutor Marked Assignment 2 √ √ √
Tutor Marked Assignment 3 √ √ √
Continuous Assessment Test 1 √ √
Continuous Assessment Test 2 √ √ √
Design Project √ √ √ √ √
Final Exam √ √ √ √ √

Recommended Reading:

# Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman., Compilers. Principles,


Techniques and Tools. [dragon book]
# Appel, Andrew W., Modern compiler implementation in C. [tiger book]
# Louden, Kenneth C., Compiler construction: Principles and Practice.
# Das, Vinu V., Compiler design using FLEX and YACC.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •9

• Course Syllabus

Unit 1: Theory of Computation


Session 01: Grammars: Properties of Context Free Grammars
Session 02: Finite State Automata and Regular Expressions
Session 03: Pushdown Automata
Session 04: Turing Machines

Unit 2: Engineering a Compiler


Session 05: An Overview of a compiler
Session 06: Lexical analysis
Session 07: Syntax analysis
Session 08: Semantic analysis
Session 09: Intermediate code generation
Session 10: Run-time environments
Session 11: Local optimizations
Session 12: Machine code generation
Session 13: Global register allocation
Session 14: Machine-independent optimization
Session 15: Overview of LEX
Session 16: Overview of YACC

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 10

5
Do you have any questions/clarifications?

… few minutes.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 11

• Introduction

Why study compilers?

• Functionality of a computer hardware is controlled by a


compatible software. To instruct, the hardware codes must be
written in binary format (series of 1s and 0s), it would be a
difficult and unmanageable task for computer programmers to
write such codes, which is why we have compilers to write such
codes. E.g., C7 06 0000 0002  MOV x, 2  x = 2

• Build a large, ambitious software system; See how theory


come to life; Learn how
- to build programming languages;
- programming languages work;
- tradeoffs in language design,
e.g. reliability, cost, efficient control of hardware, …

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 12

6
A brief history:

• First, there was nothing; then, there was machine code; next,
there were assembly languages.

• Programming expensive; 50% of costs for machines went into


programming.

• 1954 – IBM develops the 704 (software cost > hardware cost)

• 1954 ~ 1957 – FORTRAN I (FORmula TRANslating system)


is developed (In 1958, 50% of code is written in FORTRAN)

• 1960s ~ 1970s – Theories and algorithms: FA, CFG, etc

• 1970s ~ 1990s – Some tools: Lex & Yacc, Flex & Bison, etc

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 13

Applications of compiler technology:


Compiler design impacts several areas such as

• Implementation of High-Level Programming Languages


- C, Fortran, Cobol, Smalltalk, C++ , C#, Java, etc.

• Optimizations for Computer Architectures


- Parallelism, Memory Hierarchies
• Design of New Computer Architectures
- RISC (Reduced Instruction-Set Computer) architecture;
Specialized Architectures: data flow machines, vector machines,
VLIW (Very Long Instruction Word) machines, SIMD (Single
Instruction, Multiple Data) arrays of processors

• Program Translations
- Binary Translation (in x86 code), Compiled Simulation, Hardware
Synthesis (Verilog or VHDL), Database Query Interpreters (SQL)

• Software Productivity Tools


- Type Checking, Bounds Checking, Memory – Management Tools.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 14

7
• Software Languages

How are software languages implemented?

Interpreters run programs “as


is” (less preprocessing); carry-
out the meaning of a program.
E.g., LISP, basic, php, …

Compilers transform a program in


a (higher-level) language into an
efficient program in a (lower level)
language, preserving the meaning
(do extensive preprocessing).
E.g., C/C++, Java, …

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •15

Comparison:

Interpreter Compiler
- translates just one statement of the - scans the entire program and translates
program at a time into machine code. the whole of it into machine code at once.

- takes very less time to analyze the - takes a lot of time to analyze the source
source code. However, the overall time code. However, the overall time taken to
to execute the process is much slower. execute the process is much faster.

- does not generate an intermediary - always generates an intermediary object


code. Hence, an interpreter is highly code. It will need further linking. Hence
efficient in terms of its memory. more memory is needed.

- keeps translating the program - generates the error message only after it
continuously till the first error is scans the complete program and hence
confronted. If any error is spotted, it debugging is relatively harder while
stops working and hence debugging working with a compiler.
becomes easy.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 16

8
The basic steps in a Software language processing :

• The hardware understands a language, which humans cannot


understand. So we write programs in high-level language,
which is easier for us to understand and remember.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •17

Example:
c = next();
if (c == ‘\\’) {
c = next();
if (c == ‘n’)
return(‘\n’);
}

ERROR: ‘\n’ not a valid character

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •18

9
c = next();
if (c == ‘\\’) {
c = next();
if (c == ‘n’)
return(10);
}

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •19

The basic components in a Language Processing System:

• Software programs are fed into a series of tools & OS components


to get the desired code that can be used by the machine.

Program
High Level Language

Assemble Language Compiler • translates high-level


language to assembly
language
Assembler

• translates assembly Memory


language to machine
language
Linker • loads an executable
file into memory and
starts it
• builds an executable
file from a collection of
object files

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 20

10
• Anatomy of a Compiler
Program (character stream)
Machine
independent
Lexical Analyzer (Scanner)
Token Stream Front
end
Syntax Analyzer (Parser)
Parse Tree

Semantic Analyzer

Error handling
Verified Parse Tree
Symbol
table Intermediate Representation Code Generator
Intermediate Representation

Intermediate Code Optimizer


Optimized Intermediate Representation
Code Generator
Machine
dependent Assembly code
Back Machine Code Optimizer
end

23 February 2021 Machine code •21

Some tools for compiler writers

Lex/Yacc can generate program fragments that solve the tasks: reading of the source
program and discovering its structure. There are more tools …

• Lex - A Lexical Analyzer Generator by M. E. Lesk and E. Schmidt, July 21, 1975
is a program generator designed for lexical processing of character input streams.
(i.e., splitting the source file into tokens). Lex is good at pattern matching.

• Yacc - Yet Another Compiler-Compiler by Stephen C. Johnson, July 31, 1978


provides a general tool for describing the input to a computer program. Yacc user
specifies the structures of his input. (i.e., finding the hierarchical structure of the
program). Yacc is appropriate for more challenging tasks.
--------------------------------------------------------------------------
• Flex, A fast scanner generator by Vern Paxson
is a tool for generating scanners: programs which recognized lexical patterns in text.

• Bison, The Yacc-compatible parser generator by Charles Donnelly and Richard Stallman
is a general-purpose parser generator that converts a grammar description for an
LALR(1) context-free grammar into a C program to parse that grammar.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 22

11
A basic example:

Now, let’s consider what is happening in each stage of the compiler.

• The lexical analysis divides the


program text into words and
produce a sequence of tokens
([token-name, attribute-value])

Note that:
‘+=’ is a single token, NOT two
separate tokens, because ...

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 23

• After having understood the words we need to understand the


sentence structure in the Syntax analysis.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •24

12
• Once the structure of the sentence is clear we need to
understand the meaning which is called Semantic analysis:
Humans can manage quite well this activity,
the same is not so true for machines.

Compilers perform many


semantic checks besides
variable bindings.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 25

• In IR Generation, easy to produce and easy to translate code format.


Typically based on a three-address code form:

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 26

13
• It is now the most complex and effort prone activity in the
construction of modern compilers, i.e., IR Optimization.

• The compilers modify the program so that it o runs faster;


o uses less
- memory;
- power;
- bandwidth;
o makes less database
accesses, . . .

Note :
The two values, a and b
are constants and have
not changed during the
loop, thus, …

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 27

• Permits to produce assembly code to be run on the target machine,


i.e., Code Generation.

• Proportions of the various phases changed from the pioneering era

Add the contents in $2 and $3,


then put at $1;

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 28

14
• Code optimization faces many undecidable problems, so theory
alone is not enough and we need heuristics, good engineers,
and good programmers.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 29

As a summary . . .
• Lexical analysis (Scanning):
Identify logical pieces of the description.
• Syntax analysis (Parsing):
Identify how those pieces relate to each other.
• Semantic analysis:
Identify the meaning of the overall structure.
• IR Generation:
Design one possible structure.
• IR Optimization:
Simplify the intended structure.
• Code Generation:
Fabricate the structure.
• Optimization:
Improve the resulting structure.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •30

15
Do you have any questions/clarifications?

… few minutes.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 31

Now lets consider detail analysis of each step in a compiler:

• Lexical Analysis
- Strings and Languages
- Regular Expressions
- Finite Automata
• Nondeterministic FA
• Deterministic FA
- Transition table

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •32

16
• Lexical Analysis

Also called scanning, take input program string and convert into tokens

E.g., double f = sqrt(-1);

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •33

Token vs. Lexeme vs. Attribute

Token: A syntactic category/grouping


• In English: noun, verb, adjective, …
• In a programming language:
identifier - strings of letter or digits starting with a letter,
integer - a non-empty string of digits,
keyword - “else”, “if”, “while”, ...
whitespace – ‘ ‘, ‘;’, ‘[’, …

Lexeme: Concrete symptom of a token in the text.


• In a case-insensitive language, the lexemes associated with
the IF token are: if, IF, iF & If.

Attribute: “Value of interest” about a token.


• Numerical value of an integer token.
• Name (string) associated with an identifier token.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •34

17
In the Lexical stage:

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •35

Strings and Languages

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •36

18
Consider the following examples:

• Phone number: (011)-288-1272

• Email Address: ganth@ou.ac.lk

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •37

Q&As
Q1: If an alphabet consists of symbols {a, b, c} then construct
the strings s of the alphabet such that |s| = 2.
Answer:
Since no rules for forming the strings are given so we could
take any arbitrary strings of length 2. Thus, the answer is
ab, bc, ac, aa, bb, cc, …
Q2: For the same alphabet what are the strings s that satisfy the
condition of |s| = 1.
Answer:
a, b, c
Q3: The set of strings of length 0 over the alphabet ∑ = {a, b, c},
then language is given by
L = { s | s ∑* and |s| = 0}
∴ = {ε} ≠ ϕ

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 38

19
( Kleene star )

Examples:
Consider an alphabet { 0, 1 }, then the RE for
• All strings that represent binary numbers divisible by four
(with accept 0)  ((0|1)*00)|0
• All strings that do not contain “01” as a substring  1*0*

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •39

REs: Properties

• Basis symbols:
•  is a regular expression denoting language L() = {}
• a   is a regular expression denoting L(a) = {a}

• If r and s are regular expressions denoting languages


L(r) and M(s) respectively, then
• rs is a regular expression denoting L(r)  M(s)
• rs is a regular expression denoting L(r)M(s)
• r * is a regular expression denoting L(r)*
• (r) is a regular expression denoting L(r)

• A language defined by a regular expression is called a


regular set.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 40

20
Algebraic laws for REs

Law Description
r|s=s|r alternation is commutative
r|(s|t)=(r|s)|t alternation is associative
r(st) = (rs)t concatenation is associative
r(s|t) = rs | rt concatenation distributes over
(s|t)r = sr | tr alternation
εr = rε = r ε is the identity for concatenation
r* = ( r |ε)* ε is guaranteed in a closure
r** = r* * is idempotent

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 41

Extensions of Regular definitions:

• One or more instance RE Matches

• r+ = rr* = r*r a?bc abc bc


• r* = r+ | ε colou?r color colour
computers? computer computers
• Zero or one instance
A? A Ø
• r? = r |ε
(when a character is followed by “?” in a RE,
i.e., to match zero or one instance of the character.)
• Character classes
• [a-z] = abc…z
• [A-Za-z] = A|B|…|Z|a|…|z

E.g.,
• digit  [0-9]
• num  digit+ (. digit+)? ( E (+-)? digit+ )?

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 42

21
Regular Languages:
• The languages that can be described using regular expressions
are the regular languages or regular sets, i.e., if it can be
expressed in terms of regular expression.
e.g., { ambn : m and n are positive integers }
• The set of regular languages: each element is a regular language.

Examples:
Give the English descriptions of the following languages:
• ((0|1)(0|1)(0|1)(0|1)(0|1))+  All strings with length a multiple of 5
• (01)*|(10)*|(01)*0|(10)*1  All alternating binary strings
• (00|0000)*  Strings with an even number of 0’s
• (000|00|1)*  Any string of 0's and 1's with no single 0’s

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •43

In Lexical Analyzer
• Initially, the Lexer undergo the scanning process, during this
process the lexer will read the source code character by character.
Scanning process in Lexer

• After scanning the source


code, then the lexer will
carry out the analysis
phase. This stage will
analyze the single
characters to build
meaningful syntax based Analysing stage in Lexer
on the lexical grammar.
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •44

22
Note:
REs, NFAs, and DFAs
accept the same languages!

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 45

• We can implement a regular expression by turning it into a finite


automaton, i.e., a “machine” for recognizing a regular language.

Finite Automata symbols

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 46

23
E.g., A finite automaton components to construct an automaton that
recognizes (abc+)+

How FA works:
• Machine starts in start or initial state;
• Repeat until the end of the string is reached:
- Scan the next symbol s of the string
- Take transition edge labeled with s
• String is accepted if automaton is in final state when
end of string reached.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 47

Non-Deterministic Finite Automaton (NFA)

N = { Q, ∑, δ, q0, F },
where Q = Finite set of states;
∑ = Finite alphabet;
δ = Transition function from Q x ∑ to 2Q;
q0 = Initial/start state;
F = final/accepting state.

Note:
Transitions can be labeled with ε, meaning states can be
reached without reading any input, i.e., δ = Q x ∑ υ {ε} to 2Q.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •48

24
• is an algorithm invented
by Ken Thompson in 1968
to translate regular
expressions into an NFA.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •49

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •50

25
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •51

E.g., Set of all binary strings that are divisible by four (include
0 in this set) can be defined by the regular expression:
((0|1)*00) | 0
Apply Thompson’s Rules to create an NFA.

Solution:

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •52

26
((0|1)*00) | 0

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •53

((0|1)*00) | 0
We have,

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •54

27
Deterministic Finite Automaton (DFA)

D = { Q, ∑, δ, q0, F },
where Q = Finite set of states; ∑ = Finite alphabet;
δ = Transition function from Q x ∑ to Q;
q0 = Initial/start state; F = final/accepting state.

• An NFA
- can have zero, one or more than one move from a given state
on a given input symbol.
- can also have NULL moves (moves without input symbol).

• DFA
- has one and only one move from a given state on a given input
symbol.

Note: Sometimes, it is not easy to convert RE to DFA. First, you can


convert regular expression to NFA and then NFA to DFA.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •55

Conversion from NFA to DFA

• Suppose there is an NFA as { Q, ∑, δ, q0, F } which recognizes a


language L.
• Then the DFA as { Q’, ∑, δ’, q0, F’ } can be constructed for
language L as:
Step 1: Initially Q’ = ɸ.
Step 2: Add q0 to Q’. Then find the transitions from this start state.
Step 3: For each state in Q’, find the possible set of states for
each input symbol using transition function of NFA.
If this set of states is not in Q’, add it to Q’.
Step 4: Final state of DFA will be all states with contain F (final
states of NFA)

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 56

28
E.g., Consider the following NFA and draw its DFA:

Solution:
From the figure, we have, the NFA as { Q, ∑, δ, q0, F }, where

Q = {q0, q1, q2}, ∑ = {a, b},

δ= F = {q2}.

Now, need to find the DFA as { Q’, ∑, δ’, q0, F’ } 

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 57

Step 1: Initially Q’ = ɸ,
Step 2: Q’ = {q0},
Step 3: For each state in Q’, find the states for each input symbol.

• Currently, state in Q’ is q0, find moves from q0 on input symbol


a and b, separately, using transition function of NFA and update
the transition table of DFA.

• Now { q0, q1 } will be considered as a single state. Since its entry


is not in Q’, add it to Q’. So,
Q’ = { q0, { q0, q1 } }

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 58

29
• Now, moves from state { q0, q1 } on different input symbols are not present
in transition table of DFA, we will calculate it like:

δ’ ( { q0, q1 }, a ) = δ ( q0, a ) ∪ δ ( q1, a ) = { q0, q1 }


δ’ ( { q0, q1 }, b ) = δ ( q0, b ) ∪ δ ( q1, b ) = { q0, q2 }

• Now { q0, q2 } will be considered as a single state. As its entry is not in Q’,
add it to Q’. So,
Q’ = { q0, { q0, q1 }, { q0, q2 } }

• Then, moves from state {q0, q2} on different input symbols are not present
in transition table of DFA, we will calculate as:

δ’ ( { q0, q2 }, a ) = δ ( q0, a ) ∪ δ ( q2, a ) = { q0, q1 }


δ’ ( { q0, q2 }, b ) = δ ( q0, b ) ∪ δ ( q2, b ) = { q0 }

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 59

• As there is no new state generated, the final state of DFA will be state
which has q2 as its component, i.e., { q0, q2 }

• Following are the various parameters for DFA.


Q’ = { q0, { q0, q1 }, { q0, q2 } }
∑ = ( a, b )
F = { { q0, q2 } } and transition function δ’ as:

Therefore, the final DFA is: δ’ (Transitions of DFA)

for the given NFA:

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 60

30
Practice:
Convert the following REs to an NFA and then to a DFA
• (0|1)*11|0*
• Strings of alternating 0 and 1
• aba*|(ba|b)

More on (with ε) conversion of NFA to DFA


Subset construction –
• Idea: subsets of set of all NFA states are equivalent
and become one DFA state;
• Algorithm: simulates movement through NFA;
• Key problem: how to treat ε transitions?

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 61

E.g., Consider the NFA for the given RE of ((0|1)*00) | 0 drawn


in slide # 57 and lets convert it into DFA:

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •62

31
((0|1)*00) | 0

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •63

((0|1)*00) | 0

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •64

32
((0|1)*00) | 0

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •65

((0|1)*00) | 0

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •66

33
((0|1)*00) | 0

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •67

q0

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •68

34
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •69

Minimization of a finite automata –


is very useful in making the compilers execute faster, as it
removes identical operations.
Why minimizing states in DFA ?
• Reduce the amount of space required to store the DFA.
• Reduce the complexity of understanding how it works and
increase the performing speed of the program.
• When we minimize an expression, we merge two or more
states into a single equivalent state.
• Merging states should produce a smaller automaton that
accomplishes exactly the same task as our original state.

How to minimize DFA?


The two popular methods for minimizing a DFA are:
• Equivalence theorem
• Table filling method
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 70

35
Minimization of DFA using Equivalence theorem
Before apply this method need to eliminate all the dead states and
inaccessible states from the given DFA (if any):

• Dead State
All those non-final states
which transit to itself for all
input symbols in Σ are
called as dead states.

• Inaccessible State
All those states which can
never be reached from the
initial state are called as
inaccessible states.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 71

State 0 1
A B D
B C D
C C D
D E D
E C D

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •72

36
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •73

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •74

37
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •75

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •76

38
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •77

Practice:
Apply the equivalence theorem to minimize the following DFAs:

Problem 1:
After eliminating inaccessible states

Problem 2: After eliminating inaccessible states

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 78

39
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •79

Implementing DFAs
• A DFA can be implemented by a 2D table with “states” and “input
symbols”. This is called a Transition table;
• For every transition δ define the states in the table;
• DFA execution is performed –
for the given state and input, read the corresponding state in
the table and skip to that state;
• Very efficient method;

E.g.,

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 80

40
23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction •81

Do you have any questions/clarifications?

… few minutes.

23 February 2021 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 82

41

You might also like