You are on page 1of 39

Compiler Design Overview

By
Dr. Prem Nath
Associate Professor
Computer Science & Engineering Department,
H N B Garhwal University
(A Central University)

1
Table of Contents
1. Introduction
2. Language Processors
3. Compiler
4. Interpreter
5. Compiler Vs Interpreter
6. Language Processing System
7. Back End and Front End of Compiler
8. Syntax Analysis Overview
9. Synthesis Overview

Dr. Prem Nath 2


Introduction

• Programs are written in Programming


Languages.
• Programming Languages comprise Syntax like
English Language.
• Computer can execute only Machine Level
Language.
• Need for Language Translator.

Dr. Prem Nath 3


Language Processor
Language Processor translates one program
written in one Programming Language into
Equivalent another program written in another
Programming Language.

Language Processor

Compiler Interpreter

Dr. Prem Nath 4


Compiler
➢ Language Translator
➢ Software System
➢ Can read a Program in one Language (Source
Language) and Translate it into an Equivalent
Program in Another Language (Target Language).
Source
Program

Compiler Errors

Input Target Program


(m/c Executable)
Output

Dr. Prem Nath 5


Compiler…

❖ Machine Architectures are shaping Compiler.

❖ Compiler Study touches:- Programming


Languages, Machine Architecture, Software
Engg., Algorithms, etc.

Dr. Prem Nath 6


Interpreter
❖ Another kind of Language Processor
❖ Translates the Source Code into target Code
Line-by-Line.
❖ Appears to execute the operations directly.
Source
Program

Input Interpreter Errors

Output

Dr. Prem Nath 7


Compile Vs Interpreter
➢ The interpreter translates and executes the source
code line by line whereas the compiler first
translates the entire code and then executes it.
➢ The compiler works faster than the interpreter
since the interpreter translates and executes line
by line which takes more time.
➢ Interpreter can give better error diagnostics
since it works line by line.

Dr. Prem Nath 8


Compile Vs Interpreter…
➢ Compilers generate machine code, whereas interpreters
interpret intermediate code.
➢ Interpreters are easier to write and can provide better
error messages (symbol table is still available).
➢ Interpreters are at least 5 times slower than machine
code generated by compilers.
➢ Interpreters also require much more memory than
machine code generated by compilers.
❖ Examples: Perl, Python, Unix Shell, Java, BASIC, LISP

Dr. Prem Nath 9


Hybrid Compiler
❖ Combines Features of Compiler and Interpreter.
Java Source Code Compiler

Intermediate Code
(Byte Code)
m/c Independent

Interpreter
Platform Dependent
(Java Virtual m/c)

Output

Dr. Prem Nath 10


Language Processing System

11
Language Processing System…
❖ Preprocessor: Expands the Macros into Source
Program.
❖ Assembler: Processes the Program in Assembly
Language which is produced by the Compiler and
generates relocatable Machine Code.
❖ Large Programs are compiled into pieces (Object
Files).
❖ Linker: Links the relocatable m/c code with other
relocatable object codes and Library Files. It resolves
the external memory references.
❖ Loader: Loades all the executable object files into
main memory for execution.
Dr. Prem Nath 12
Front-end and Back-end Division
Source IR Machine
Front end Back end
code code

Analysis Errors Synthesis

Front end maps legal code into IR


(Intermediate Representation)
Back end maps IR onto target Machine Code
Allows multiple front ends
Multiple passes may result in better code
Front End

Source Tokens IR
Scanner Parser
Code

Errors

Also called Lexical Analysis or Scanning


Recognize Legal Code
Report Errors
Produce IR
Front End…
Source Tokens IR
Scanner Parser
code

Errors
Scanner/Lexical Analyzer
Groups the characters into meaningful sequences
called Lexemes
Output Token of the Form: <token-name, attribute-value>
Maps characters into tokens – the basic unit of syntax
Eliminate white space (tabs, blanks, comments)
Front End…
Front End…

Source tokens IR
Scanner Parser
code

Errors
Parser:
Recognize context-free syntax
Guide context-sensitive analysis
Construct IR
Produce meaningful error messages
Attempt error correction
There are parser generators like YACC which
automates much of the work
Front End…
➢Context-free-grammars are used to represent
programming language syntaxes.
➢Parse Tree is constructed.
Back End…
Instruction Register Machine Code
IR selection Allocation

Errors

Translate IR into target machine code


Choose instructions for each IR operation
Decide what to keep in registers at each
point
Ensure conformance with system interfaces
Back End…
Instruction Register Machine code
IR selection Allocation

errors

Produce compact fast code


Use available addressing modes
Back End…

Instruction Register Machine Code


IR selection Allocation

Errors

Have a value in a register when used


Limited resources
Optimal allocation is difficult
Back End…
Traditional Three Pass Compiler
Source IR Middle IR Machine
Front End Back End
code End code

Errors

Code improvement analyzes and change IR


Goal is to reduce runtime
Middle End (Optimizer)
Modern optimizers are usually built as a set
of passes
Typical passes
Constant propagation
Common sub-expression elimination
Redundant store elimination
Dead code elimination
Phases of Compiler

25
Translation Overview - Lexical Analysis
LA can be generated automatically
from regular expression specifications.
LEX and Flex are two such tools.
Lexical Analyzer is a DFA.
Why is LA separate from parsing?
➢ Simplification of Design - Software
Engineering reason
➢ I/O issues are limited LA alone
➢ LA based on finite automata are more
efficient to implement than pushdown
automata used for parsing (due to stack)

26
Translation Overview - Syntax Analysis or Parsing
Syntax Analyzers (Parsers) can be generated
automatically from several variants of context-free
grammar specifications
➢ LL(1), and LALR(1) are the most popular
ones
➢ ANTLR (for LL(1)), YACC and Bison (for
LALR(1)) are such tools
Parsers are deterministic push-down
automata
Parsers cannot handle context-sensitive
features of programming languages; e.g.,
➢ Variables are declared before use
➢ Types match on both sides of assignments
➢ Parameter types and number match in
declaration and use 27
Translation Overview - Semantic Analysis
Semantic consistency that cannot be
handled at the parsing stage.
Stores type information in the symbol
table or the syntax tree
➢ Types of variables, function
parameters, array dimensions, etc.
➢ Used not only for semantic validation
but also for subsequent phases of
compilation
➢ Static semantics of programming
languages can be specified using
attribute grammars

28
Translation Overview - Intermediate Code Generation
While generating machine code directly from
source code is possible, it entails two problems
With m languages and n target machines, we
need to write m X n compilers
The code optimizer which is one of the largest
and very-difficult-to-write components of any
compiler cannot be reused
By converting source code to an intermediate
code, a machine-independent code optimizer
may be written
Intermediate code must be easy to produce and
easy to translate to machine code

29
Different Types of Intermediate Code
The type of intermediate code deployed is based on the
application
Quadruples, triples, indirect triples, abstract syntax trees
are the classical forms used for machine-independent
optimizations and machine code generation
Static Single Assignment form (SSA) is a recent form and
enables more effective optimizations
Conditional constant propagation and global value
numbering are more effective on SSA
Program Dependence Graph (PDG) is useful in automatic
parallelization, instruction scheduling, and software
pipelining 30
Translation Overview - Code Optimization

31
Translation Overview - Code Generation

32
Machine-Dependent Optimizations
Peephole optimizations
Analyze sequence of instructions in a small window (peephole) and
using preset patterns, replace them with a more efficient sequence
Redundant instruction elimination
e.g., replace the sequence [LD A,R1][ST R1,A] by [LD A,R1]
Eliminate “jump to jump” instructions
Use machine idioms (use INC instead of LD and ADD)
Instruction scheduling (reordering) to eliminate pipeline interlocks
and to increase parallelism
Trace scheduling to increase the size of basic blocks and increase
parallelism
Software pipelining to increase parallelism in loops

33
Compilation Process and T-Diagram
➢ S: Source Language
➢ T: Target Language S T
➢ I: Implemented Language
The language in which Compiler is I
implemented

Source Compiler Target


Language
Language
Implemented
Language

Dr. Prem Nath 34


Compilation Process and T-Diagram…
➢ A: Source Language, B: Target Language
➢ Py: Python Language, AL: Assembly Language
A B
Py Py M
C++
C++ M
C
C M
AL
AL M
M
Dr. Prem Nath 35
Compilation Process and Bootstrapping
➢ Bootstrapping: It is a process to Convert a Complex
Program using a Compiler which is implemented in a
simple language.
➢ It produces even more complex program as output.
➢ A compiler can be designed using Bootstrapping.
➢ A Compiler compiles its own program into another
program nearer to the machine language.
C++ M
C
C M
AL
AL M
Dr. Prem Nath M 36
Cross Compiler
➢ Cross Compiler is a compiler which runs on one
machine and produces output code for another machine.

Pascal Compiler C First Machine

Pascal Compiler C Second Machine

C++

Dr. Prem Nath 37


Cross Compiler…
➢ Cross Compiler is a compiler which runs on one
machine and produces output code for another
machine.

Pas C Pas C

C C C++ C++

C++

Dr. Prem Nath 38


Thanks

39

You might also like