You are on page 1of 11

Course information

• Theory Class (CSL1601)


• Monday 10am-11am
• Wednesday 11am-12noon
• Thursday 11am-12noon
• Lab (CSP1601)
• Thursday 2pm-5pm
• Lab will start from 2nd or 3rd week.

+ Any additional resources used will be informed.


What is a compiler?
• A program that translates a program (source program/code) written in one programming language into
another program (target program/code) written in another programming language.

• Also, it checks if the source program conforms to the source language specifications. Hence, may give Error
messages, Warnings.

• It is primary used to refer programs that translate source code from high-level programming language to a
low-level programming language (assembly code, object code, machine code).
Some compiler implementations
• C Compiler
• gcc (GNU project, open source), clang (LLVM project, open source), etc.
• C++ Compiler
• g++ (GNU project, open source), clang++ (LLVM project, open source), etc.
• Java Compiler
• javac (OpenJDK, open source)
Source program

Phases of compilation Lexical analysis

Tokens

Syntax analysis

Parse tree

Semantic analysis

Symbol table Semantic correctness Error handling &


management recovery
Intermediate code
generation
Intermediate code

Optimization

Optimized intermediate code


Target code
generation
Target code
Optimization

Optimized target code


Source program

Phases of compilation Lexical analysis


• Lexical analysis Tokens
• Scan input program to identify valid words,
removes comments, extra white space.
Syntax analysis
• Each program keywords, variables, numbers
and symbols are called tokens* and passed
onto Syntax analyzer. Parse tree
• Expanding user defined macros like #include,
#define in prog. Language like C. Semantic analysis
• Implemented as finite automata.
Symbol table Semantic correctness Error handling &
management recovery
Intermediate code
generation
Intermediate code

Optimization

Optimized intermediate code


Target code
generation
Target code
* Instance of a token are usually referred to as lexeme. Optimization

Optimized target code


Source program

Phases of compilation Lexical analysis

• Syntax analysis Tokens


• Checks the syntax correctness of the input
program by constructing a parse tree/syntax Syntax analysis
tree (using pre-defined grammar of the
language and input program). Parse tree
• There should a rule (of the prog. language)
that can generate what is written to be Semantic analysis
syntactically correct.
Symbol table Semantic correctness Error handling &
• Works hand in hand with lexical analysis. management recovery
• Implemented as push-down automata. Intermediate code
generation
Intermediate code

Optimization

Optimized intermediate code


Target code
generation
Target code
Optimization

Optimized target code


Source program

Phases of compilation Lexical analysis

• Semantic analysis Tokens


• Semantics (of what is written) depend on the
prog. language. Syntax analysis
• Common check that is done: type of variables
and expressions. Parse tree
• Variable type check depends on strongly typed Semantic analysis
or loosely typed languages.
• Expression check -> whether the operands can Symbol table Semantic correctness Error handling &
be used with the operator. management recovery
Intermediate code
generation
Intermediate code

Optimization

Optimized intermediate code


Target code
generation
Target code
Optimization

Optimized target code


Source program
Phases of compilation
Lexical analysis
• Intermediate code generation
• Generates intermediate code that can be later Tokens
translated to different machine codes (for
different processors) Syntax analysis
• Generating machine code directly from source
code is possible, but there are problems Parse tree
• With ‘m’ languages and ‘n’ target
machines, we need to write Semantic analysis
m × n compilers Symbol table Semantic correctness Error handling &
• The code optimizer which is one of the management recovery
largest and very-difficult-to-write Intermediate code
components of any compiler cannot be generation
reused Intermediate code
• By converting source code to an intermediate
code, a machine-independent code optimizer Optimization
may be written
• Intermediate code must be easy to produce Optimized intermediate code
and easy to translate to machine code Target code
• A sort of universal assembly language generation
• Should not contain any machine-specific Target code
parameters (registers, addresses, etc.)
Optimization

Optimized target code


Source program

Phases of compilation Lexical analysis


• Intermediate Code optimization Tokens
• Intermediate code generation process
introduces many inefficiencies Syntax analysis
• Extra copies of variables, using variables
instead of constants, repeated evaluation Parse tree
of expressions, etc.
• Code optimization removes such inefficiencies Semantic analysis
and improves code
• Improvement may be time, space, or power Symbol table Semantic correctness Error handling &
consumption management recovery
• It changes the structure of programs, Intermediate code
sometimes of beyond recognition generation
• Inlines functions, unrolls loops, Intermediate code
eliminates some programmer-defined
variables, etc.
Optimization
• Code optimization consists of a bunch of
heuristics and percentage of improvement
depends on programs (may be zero also) Optimized intermediate code
Target code
generation
Target code
Optimization

Optimized target code


Source program

Phases of compilation Lexical analysis

• Target code generation Tokens


• Generates the actual machine code that is
executed by the processor. Syntax analysis
• Target code depends on machine instructions,
addressing modes and number of registers, Parse tree
etc.
Semantic analysis
• Must handle all aspects of machine
architecture: Registers, pipelining, cache, Symbol table Semantic correctness Error handling &
multiple function units, etc. management recovery
• Storage allocation decisions are made here. Intermediate code
Register allocation and assignment are the generation
most important problems. Intermediate code

Optimization

Optimized intermediate code


Target code
generation
Target code
Optimization

Optimized target code


Source program

Phases of compilation Lexical analysis

• (Target) Code optimization Tokens


• Different kinds of optimizations can be
performed on the targe code generated. Syntax analysis
• Instruction scheduling (reordering) to
eliminate pipeline interlocks and to increase Parse tree
parallelism
Semantic analysis
• Trace scheduling to increase the size of basic
blocks and increase parallelism Symbol table Semantic correctness Error handling &
• Software pipelining to increase parallelism in management recovery
loops Intermediate code
generation
Intermediate code

Optimization

Optimized intermediate code


Target code
generation
Target code
Optimization

Optimized target code

You might also like