Professional Documents
Culture Documents
Kenesa B. (getkennyo@gmail.com)
CHAPTER ONE
INTRODUCTION
O U T L I N E
q Introduction
q Language processors
§ Compilers
§ Interpreters
q Language processing system
q Phases of compilers
§ Lexical analysis, Syntax analysis, Semantic analysis, IC generator, IC
optimizer, Code generator, Code optimizer
q Error handling routine
q Compiler construction tools
program
compiler compiler
compiler Unix
Windows
Macs
Source Program
Source Program
Interpreter Output
Input
§ Interpreter: § Compiler:
Ø Interpreter takes one statement then Ø While compiler translates the entire
translates it and executes it and then takes program in one go and then executes it.
another statement. Ø Compiler generates the error report after
Ø Interpreter will stop the translation after it the translation of the entire program.
gets the first error. Ø Compiler takes a large amount of time in
Ø Interpreter takes less time to analyze the analyzing and processing the high level
source code. language code.
Ø Over all execution speed is less. Ø Overall execution time is faster.
• E.g. Javascript, PERL, PHP, Python, • E.g. FORTRAN, C, C++, COBOL, VHDL, etc.
Ruby, awk, sed, shells (bash)...etc.
Kenesa B. (Ambo University) Compiler Design 10
Hybrid approaches
q Java language processors combine compilation and interpretation.
§ A Java source program may first be compiled into an intermediate form called
bytecodes.
• a compact representation intended to decrease download times for Java applications.
§ The bytecodes are then interpreted by a virtual machine.
§ Java applications execute by running the bytecode on the corresponding Java
Virtual Machine (jvm), an interpreter for bytecode.
q Benefit: bytecodes compiled on one machine can be interpreted on another
machine, perhaps across a network.
q Just-in-time compilers (JIT), translate the bytecodes into machine language
immediately before they run the intermediate program to process the input.
§ In order to achieve faster processing of inputs to outputs
§ Compile some or all byte codes to native code
Kenesa B. (Ambo University) Compiler Design 11
Hybrid approaches
Source Program e r
et
pr
t er
In
Java Program Java bytecode
compiler Interpreter
Translator
Inte
rpre
t er
Intermediate Program
Input
A hybrid compiler 12
Language Processing System
q In addition to a compiler, several other programs may be required to create an
executable target program
q A source program may be divided into modules stored in separate files.
q The task of collecting the source program is sometimes entrusted to a separate
program, called a preprocessor.
§ A preprocessor program executes automatically before the compiler’s
translation phase begins.
§ Such a pre-processor:
• can delete comments
• macro processing (substitutions)
• include other files
• produce input to a compiler...
• E.g. Preprocessor will remove all the #include<...> by file inclusion and #define<...>
by macro expansion in C (C++)
Kenesa B. (Ambo University) Compiler Design 13
Language Processing System
q The modified source program is then fed to a compiler.
q The compiler may produce an assembly-language program as its output
§ because assembly language is easier to produce as output and is easier to
debug.
q The assembly language is then processed by a program called an assembler that
produces relocatable machine code as its output.
q Large programs are often compiled in pieces
§ so the relocatable machine code may have to be linked together with other
relocatable object files and library files
q The linker resolves external memory addresses, where the code in one file may
refer to a location in another file.
q The loader then puts together all of the executable object files into memory for
execution.
Kenesa B. (Ambo University) Compiler Design 14
Language Processing System
Source program Preprocessor
(C or C++ program)
Modified source program (C or C++ program with macro
substitutions and file inclusions)
Compiler
Assembler
Parser
Intermediate code
optimizer
Syntax tree
Intermediate code
Error Semantic Error
handler analyzer handler
Target code generator
(Target code optimizer)
Annotated
tree
Target code
• Notes: tokens are atomic items, not character strings; comments & whitespace are not tokens
q The lexical analyzer reads the stream of characters making up the source program
and groups the characters into meaningful sequences called lexemes.
q For each lexeme, the lexical analyzer produces as output a token of the form
(token-name, attribute-value) that it passes on to the subsequent phase, syntax
analysis
q A scanner may perform other operations along with the recognition of tokens.
§ It may enter identifiers and literals into the symbol table
Kenesa B. (Ambo University) Compiler Design 23
Lexical Analysis - Scanning
q the first component token-name is an abstract symbol that is used during syntax
analysis
q the second component attribute-value points to an entry in the symbol table for
this token.
§ Information from the symbol-table entry is needed for semantic analysis and code generation
q Example: position = initial + rate * 60
q The characters in this assignment could be grouped into the following lexemes:
§ position is a lexeme that would be mapped into a token (id, 1)
– where id is an abstract symbol standing for identifier and 1 points to the
symbol table entry for position.
– the symbol-table entry for an identifier holds information about the
identifier, such as its name and type.
(id, 1) +
(id, 2) *
(id, 3) 60
Annotated (id, 1) +
syntax tree
*
(id, 2)
inttofloat
(id, 3)
60
Intermediate Code (IC) Generation
q During translation a compiler may construct one or more intermediate
representations
§ which can have a variety of forms: Three-address code (TAC), quadruples,
triple, P-code for an abstract machine, Tree or DAG representation
• e.g. Syntax trees are a form of intermediate representation (during syntax
and semantic analysis)
q After syntax and semantic analysis, many compilers generate an explicit low-level
or machine-like intermediate representation,
§ which we can think of as a program for an abstract machine.
q This intermediate representation should have two important properties:
§ it should be easy to produce and
§ it should be easy to translate into the target machine.