You are on page 1of 28

Chapter one

1 Introduction to Compiler Design


 Compilers are computer programs or software that used to translates
high-level programming language(HLPL) or source program to
Computer understandable language or machine language.
 source program is written in a high-level language, such as
C, C++, Java, C#, PHP, Perl, Android etc..
Cont.…

Types of compilers
 Cross-Compiler: that runs on a machine 'A' and produces a
code for another machine 'B'.
 Source to Source Compiler: is a compiler that translates
the source program (code) written in one programming
language into the source code for another programming
language.
 Native Compiler: a compiler that translates source
program to object code on the same platform.
 Virtual Machine: A virtual machine is a software
implementation of a machine (for example, a computer)
that executes programs like a physical machine.
2. Language Processing System

 any computer system is made of hardware and software.


 The hardware understands a language, which humans
cannot understand. So we write programs in high-level
language, which is easier for us to understand and
remember.
 These programs are then fed into a series of tools and OS
components to get the desired code that can be used by the
machine. This is known as Language Processing System.
Cont.

 Language processing diagram


Cont.

 Language processing diagram


Cont.

 The high-level language is converted into binary language in various phases


 A compiler is a program that converts high-level language to assembly
language.
 an assembler is a program that converts the assembly language to machine-
level language.
 Let us first understand how a program, using C compiler, is executed on
a host machine
 User writes a program in C language (high-level language).
 The C compiler compiles the program and translates it to assembly
program (low-level language).
 An assembler then translates the assembly program into machine code
(object).
 A linker tool is used to link all the parts of the program together for
execution (executable machine code).
 A loader loads all of them into memory and then the program is executed.
2.1 Pre-Processor


A preprocessor is a tool that produces input for compilers.

Its purpose is to process directives.

Directives are specific instructions that start with # symbol
and end with a newline (NOT semicolon).

A preprocessor may allow a user to define macros that are
short hands for longer constructs.

A macro is a rule that defines how an input sequence (e.g.
an identifier) is converted into a replacement output
sequence (e.g. some text).
Example: #define DTU “Debre Tabor University”
.
2.2. COMPILER

 Compiler is a translator program that translates a program


written in (HLL) the source program and translates it into
an equivalent program in (ML) the target program .
 As an important part of a compiler is error showing to the
programmer.
Cont.…

 Executing a program written n HLL programming language


is basically of two parts.
 The source program must first be compiled translated into
a object program.
 Then the results object program is loaded into a memory
executed .
2.3. Assembler

 An assembler translates assembly language programs


into machine code.
 The output of an assembler is called an object file, which
contains a combination of machine instructions as well
as the data required to place these instructions in
memory.

2.4. Linker
 Linker is a computer program that links and merges
various object files together in order to make an
executable file.
 All these files might have been compiled by separate
assemblers.
Con…
The major task of a linker is to search and
locate referenced module/routines in a
program and to determine the memory
location where these codes will be loaded,
making the program instruction to have
absolute references.
Linking is performed at the last step in
compiling a program.
Source code  compiler  Assembler  Object code  Linker  Exécutable file 
Loader
2.5. Loader

 Loader is a part of an operating system and is


responsible for loading executable files
into memory and executes them.
 It calculates the size of a program
(instructions and data) and creates memory
space for it.
 It initializes various registers to initiate
execution.
Compiler vs. Interpreter
3. Phases of a compiler

 The compilation process is a sequence of various phases


 Let us understand the phases of a compiler
3. 1 Lexical Analysis (Scanning )


The first phase of compiler also called scanner works as a text scanner.

This phase scans the source code as a stream of characters and converts it into
meaningful lexemes called tokens.

The scanner begins the analysis of the source program by reading the input
text—character by character—and grouping individual characters into tokens
(identifiers, integers, reserved words, delimiters, and so on).

The scanner does the following.
It puts the program into a compact and uniform format (a stream of tokens).
It eliminates unneeded information (such as comments).
It processes compiler control directives (for example include source text from a
file).
It sometimes enters preliminary information into symbol tables (for example, to
register the presence of a particular label or identifier).
Cont.…

Examples of Tokens:
a. Key words: while, if, void, int, float, for, …
b. Identifiers: declared by the programmer
c. Operators: +, -, *, /, =, ==, <, >, <=, >=, …
d. Numeric Constants: numbers such as 124, 12.35, 0.09E-23, etc
e. Character constants: single character or strings of characters enclosed in
quotes.
f. Special characters: characters used as delimiters such as ( ) , ; :
Example: Show the token classes or types, put out by the lexical analysis phase
corresponding to this C++ source input:
a) position = initial + rate * 60 ;
b) sum = sum + unit * /* accumulate sum */ 1.2e-12 ;
3.2. Syntax Analysis (The parser)

 The next phase is called the syntax analysis or parsing.


 It takes the token produced by lexical analysis as input and generates a parse
tree (or syntax tree).
 the parser checks if the expression made by the tokens is syntactically correct
and verifies correct syntax
 If a syntax error is found, it issues a suitable error message .
 As syntactic structure is recognized, the parser usually builds an abstract syntax
tree (AST)
Example: Show a syntax tree for the C/C++ statement
a. position = initial + rate * 60
3.3. Semantic Analysis

 Semantic analysis checks whether the parse tree constructed follows the rules of
language.
 the semantic analyzer keeps track of identifiers, their types and expressions; whether
identifiers are declared before use or not, etc.
 The semantic analyzer produces an annotated syntax tree as an output.
 The type checker checks the static semantics of each AST node.
 Example: Draw an Attributed AST for position = initial + rate * 60
Cont.…

 Semantic errors:
Undeclared identifier
Multiple declared identifier
Index out of bounds
Wrong number or types of args to call
Incompatible types for operation
Break statement outside switch/loop
Goto with no label
 Etc…..
3.4. Intermediate Code Generation

After semantic analysis, the compiler generates an intermediate code of the


source code for the target machine.
This intermediate code should be generated in such a way that it makes it
easier to be translated into the target machine code.
One popular type of intermediate-language representation is “Three Address
Code (TAC)”.
Three-address code statement is: A := B op C where A, B and C are operands
and op is a binary operator.
Example: The parse tree for position = initial + rate * 60 might be converted
into the three-address sequence:
Solution: Three Address Code
temp1:= int to real (60)
temp2:= id3 * temp1
temp3:= id2 + temp2
id1:= temp3.
3.5. Code Optimization

 This is optional phase described to improve the intermediate code


so that the output runs faster and takes less space.
 Optimization removes unnecessary code lines, and arranges the
sequence of statements in order to speed up the program execution
without wasting resources (CPU,memory).
 The optimizer can produce an optimized three address code for
position = initial + rate * 60 as follows:
3.6. Code Generation

 The last phase of translation is code generation.


 In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language.
 The code generator translates the intermediate code into a sequence of
(generally) re-locatable machine code.
 Sequence of instructions of machine code performs the task as the
intermediate code would do.
Cont…

 The code generator may produce either:


• Machine code for a specific machine, or
• Assembly code for a specific machine and assembler.
 If it produces assembly code, then an assembler is used to produce the machine
code.
 Example: The intermediate code position = initial + rate * 60; may be translated into
the assembly code as follows:
3.7. Symbol Table


It is a data-structure maintained throughout all the phases of a compiler .

All the identifiers’ names along with their types are stored here.

The symbol table makes it easier for the compiler to quickly search the identifier record and
retrieve it.

The symbol table is also used for scope management.

A symbol table is a mechanism that allows information to be associated with identifiers and
shared among compiler phases.

3.8. Error Recovery and Handling



One of the most important functions of a compiler is the detection and reporting of errors in
the source program.

The error message should allow the programmer to determine exactly where the errors have
occurred. Errors may occur in all or the phases of a compiler.
Cont..

3.8. Error Recovery and Handling


 A program may have the following kinds of errors at
various stages:
Lexical : name of some identifier typed incorrectly
Syntactical : missing semicolon or unbalanced
parenthesis
Semantically : incompatible value assignment
Logical : code not reachable, infinite loop
Summary on Phases of Compiler, Show what will expect at each phase of for the
S umme ry

expression position:=initial + rate*60?


individual Assignment from 30%

Discuss on Compiler Construction Tools


Parser Generator
Scanner Generator
Syntax-Directed Translation Engines
Automatic Code Generators
Data-Flow analysis Engines
Compile Construction Toolkits
Thank you!

You might also like