Professional Documents
Culture Documents
Language
Humans use natural languages to communicate with each other e.g. English, Kiswahili, etc. But use programming languages to communicate with a computer e.g. Java, Pascal, C++
What is a Compiler?
A computer program is a set of instructions that the computer can understand and execute. In reality computers don t understand the instructions, they simply process data. Computer languages need to be unambiguous and have an exactly defined syntax and semantic (unlike natural language). High level programming languages have been developed for human convenience and readability. How?
What is a Compiler?
A compiler is therefore: A program that translates a high-level language program into a functionally equivalent low-level language program. A compiler is basically a translator whose source language (i.e., language to be translated) is the high-level language, and the target language is a low-level language. A compiler is used to implement a high-level language on a computer. High-level languages are more suitable for humans to work with Computers execute instructions in machine language There is need to convert programs written in high-level languages to a format that machines can execute machine language
What is a Compiler?
A compiler is therefore: A program that reads the high level input program and converts the high level language into machine code. A system software that converts a source language program into an equivalent target language program, ensuring that the input program conforms to the source language specification.
What is a Compiler?
Reasons for using high-level languages 1. Compared to machine language, the notation used by programming languages is closer to the way humans think about problems. 2. The compiler can spot some obvious programming mistakes. 3. Programs written in a high-level language tend to be shorter than equivalent programs written in machine language. 4. Another advantage of using a high-level level language is that the same program can be compiled to many different machine languages and, hence, be brought to run on many different machines.
What is a Compiler?
Reasons for Studying Compiler Construction 1. It is considered a topic that you should know in order to be wellcultured in computer science. 2. A good craftsman should know his tools, and compilers are important tools for programmers and computer scientists. 3. The techniques used for constructing a compiler are useful for other purposes as well. 4. There is a good chance that a programmer or computer scientist will need to write a compiler or interpreter for a domain-specific language.
What is a Compiler?
Program text
Compiler
Machine code
Errors
What is Language?
Program text is expressed in a programming/computer language. Language comprises of: Alphabet e.g. A-Z, 0-9, special symbols e.g. _, +, * Words or tokens e.g. if , { , elsif Phrases e.g. if (x<y) then x++; Rules that describe the major language elements: Syntax determines what phrases there are in the language. Semantics determines what a phrase means. How can we specify tokens or words? What of language structure?
Compilation
Compilation refers to the compiler's process of translating a high-level language program into a low-level language program. This process is very complex; hence, from the logical as well as an implementation point of view, it is customary to partition the compilation process into several phases. These phases are nothing more than logically cohesive operations that take as input one representation of a source program, and output another representation.
Phases of a Compilation
What is a Compiler?
Compilers are large, complicated programs that can only convert programs that conform to the syntax and semantic rules for a particular language. Compilation can be broken into two stages: Analysis (Front End) Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Generation Synthesis (Back End) Code Optimization Code Generation
Compilation Front-End
The analyzer (front-end) Recognises legal constructs Reports errors Produces Intermediate Language Generates a preliminary storage map
Compilation Back-End
The synthesizer (back-end) Translates the intermediate language (IL) into target machine code Chooses the instructions required for each IL operation Decides what information to keep on the processor registers Ensures that the resulting program uses the target system efficiently
Compiler Front-End
Compiler Front-End
Syntactic Analysis
This phase takes the list of tokens produced by the lexical analysis phase and arranges these in a tree-structure (called the syntax tree) that reflects the structure of the program. This phase is often called parsing.
Syntactic Analysis
The Syntactic Analyzer (or Parser) will analyze groups of related tokens (``words'') that form larger constructs (phrases). These include arithmetic expressions and statements such as: while expression do statement ; x := a + b * 7; It will convert the linear string of tokens into structured representations such as expression trees and program flow graphs.
Semantic Analysis
This phase is also referred to as type checking This phase analyses the syntax tree to determine if the program violates certain consistency requirements, e.g., if a variable is used but not declared or if it is used in a context that does not make sense given the type of the variable, such as trying to use a boolean value as a function pointer.
Compiler Back-end
Register allocation: The symbolic variable names used in the intermediate code are translated to numbers, each of which corresponds to a register in the target machine code. Machine-code generation: The intermediate language is translated to assembly language (a textual representation of machine code) for a specific machine architecture. Assembly and Linking: The assembly-language code is translated into binary representation and addresses of variables, functions, etc., are determined.
Structure of a Compiler
Source Language
Target Language
Structure of a Compiler
Source Language
Front End
Intermediate Code
Back End
Target Language
Structure of a Compiler
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Front End
Back End
Target Language
Structure of a Compiler
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code Code Optimizer Target Code Generator Target Language
Front End
Back End
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code Code Optimizer Target Code Generator Target Language
Example Compilation
Source Code: cur_time = start_time + cycles * 60
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Source Code: cur_time = start_time + cycles * 60 Lexical Analysis: ID(1) ASSIGN ID(2) ADD ID(3) MULT INT(60)
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Source Code: cur_time = start_time + cycles * 60 Lexical Analysis: ID(1) ASSIGN ID(2) ADD ID(3) MULT INT(60) Syntax Analysis: ASSIGN ID(1) ID(2) ADD MULT ID(3) INT(60)
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer
Example Compilation
Syntax Analysis: ASSIGN ID(1) ID(2) ADD MULT INT(60)
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Sematic Analysis: ASSIGN ID(1) ID(2) ADD MULT ID(3) int2real INT(60) Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code (step 0): temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code (step 1): temp1 = 60.0 temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code (step 2): temp2 = id3 * 60.0 temp3 = id2 + temp2 id1 = temp3
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code (step 3): temp2 = id3 * 60.0 id1 = id2 + temp2
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code: temp1 = id3 * 60.0 id1 = id2 + temp1
Target Language
Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator
Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code: temp1 = id3 * 60.0 id1 = id2 + temp1 Target Code: MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1
Target Language
PASCAL Example
Grammar (specified in BNF) BNF grammar contains a set of rules that define the syntax of some construct in the programming language. ::= < > => is defined to be => non terminal symbols (constructs defined in the grammar) No brackets => terminal symbols Sample program Lexical analysis Syntax analysis
<id-list> ::= id | <id-list> , id <stmt-list> <stmt> <exp> <term> <read> ::= <assign> | <read> | <write> | <for> ::= <term> | <exp> + <term> | <exp> - <term> ::= <factor> | <term> * <factor> | <term> DIV <factor> ::= READ ( <id-list> )
<factor> ::= id | int | ( <exp> ) <write> ::= WRITE ( <id-list> ) <for>::= FOR <index-exp> DO <body> <index-exp> ::= id := <exp> TO <exp> <body> ::= <stmt> | BEGIN <stmt-list> END
Exercise
Draw the parse tree for the example Pascal program, using the specified BNF Pascal grammar.