Professional Documents
Culture Documents
net/publication/332726973
CITATIONS READS
0 2,496
1 author:
Rajendra Kumar
Chandigarh University
52 PUBLICATIONS 111 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rajendra Kumar on 29 April 2019.
THE COMPILER
A compiler is a software that takes a program written in a high level language and translates it into an equivalent
program in a target language. Most specifically a compiler takes a computer program and translates it into an
object program. Some other tools associated with the compiler are responsible for making an object program into
executable form.
error messages
Fig. 1.1 Major function of Compiler
Source program – It is normally a program written in a high-level programming language. It contains a set of
rules, symbols, and special words used to construct a computer program.
Target program – It is normally the equivalent program in machine code. It contains the binary representation
of the instructions that the hardware of computer can perform.
Error Message – A message issued by the compiler due to detection of syntax errors in the source program.
Assembly Language – It is a middle-level programming language in which a mnemonics are used to represent
each of the machine language instructions for a specific computer. Assembly language programs also allow the
user to use text names for data rather than having to remember their memory addresses. An Assembler is a
computer program that translates an assembly language program into machine code.
Table 1.1 The machine and assembly language codes
Machine Language Assembly Language
100101 ADD
011001 SUB
001101 MPY
100111 CMP
100011 JMP
110011 JNZ
Typically, a compiler includes several functional parts. For example, a conventional compiler may include a
lexical analyzer that looks at the source program and identifies successive “tokens” in the source program.
A conventional compiler also includes a parser / syntactical analyzer, which takes as an input a grammar defining
the language being compiled and a series of actions associated with the grammar. The syntactical analyzer builds
a “parse tree” for the statements in the source program in accordance with the grammar productions and actions.
For each statement in the input source program, the syntactical analyzer generates a parse tree of the source input
in a recursive, “bottom-up” manner in accordance with relevant productions and actions in the grammar. Thus, the
Enter Source Edit Source Code xyz.c Re-edit source program (HLL)
Code (Editor)
Debugger
Syntax Yes
Errors?
No
Compiler
Linker
Linking Yes
Errors?
Loader
Execution
Stop
Java, CC and cc compilers do not have their own editors. Java compiler compiles the program written using
notepad and the code written using ed or vi editors can be compiled by CC (C++ compiler in Unix) and cc (C
compiler in Unix) compilers.
Computer Executes
Input Machine Language Linker/Loader
Program
xyz.exe
Linker,
Input source
Translator
Object program Loader, and Output
program Run-time
System
Lexical Analyzer
Syntax Analyzer
Symbol
Table Error
Manager Semantic Analyzer Handler
Code Optimizer
Code Generator
Target Program
Lexical analyzer
Lexical analyzer takes the source program as an input and produces a long string of tokens. Lexical Analyzer
reads the source program character by character and returns the tokens of the source program. The process of
generation and returning the tokens is called lexical analysis. A token describes a pattern of characters having
same meaning in the source program (such as identifiers, operators, keywords, numbers, delimiters and so on).
Tokens are the terminal strings of grammars, for example, white space, comments, reserved word identification.
Modern lexical generators handle these problems. The modern lexical analyzers remove non-grammatical
elements from the stream – i.e. spaces, comments. A lexical analyzer is implemented with a Finite State Automata
(FSA) that contains a finite set of states with partial inputs, and transition functions to move between states. Let us
consider a high level language assignment statement
newval := old_val + 12
Many lexemes map to the same token. e.g. “x” and “abc”. Note, some lexemes might match many patterns. It is
mandatory to resolve ambiguity in CFGs. Since tokens are terminals, they must be “produced” by the lexical
phase with synthesized attributes in place.
The output of a phase is the input to the next phase. For example, the output of lexical analyzer is the input to
syntax analyzer, the output of syntax analyzer is the input to semantic analyzer, and so on. Each phase transforms
the source program from one representation into another representation. They communicate with error handlers
and the symbol table.
The phases of a compiler are collected into front end and back end. The front end includes all analysis phases and
the intermediate code generator. The back end includes the code optimization phase and final code generation
phase. The front end analyzes the source program and produces intermediate code while the back end synthesizes
the target program from the intermediate code.
Syntax Analyzer
A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program. In other words, a
Syntax Analyzer takes output of lexical analyzer (list of tokens) and produces a parse tree. A syntax analyzer is
also called as a parser. A parse tree describes a syntactic structure of the program. The syntax is defined as the
physical layout of the source program. The grammars describe precisely the syntax of a language. Two kinds of
grammars which compiler writers use a lot are: regular, and context free
identifier number
old_value 20
sentence
Fig.1.7 Derivation tree for newval := old_val + 20 and tim ate the big ball
The syntax of a language is specified by a context free grammar (CFG). The rules in a CFG are mostly
recursive. A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not. If it
satisfies, the syntax analyzer creates a parse tree for the given program. For example, we use BNF (Backus Naur
Form) to specify a CFG
assign_stmt identifier := expression
expression identifier
expression number
expression expression + expression
A syntax directed translation traverses a syntax tree and builds a translation in the process.
Type-checking is an important part of semantic analyzer. Normally semantic information cannot be represented
by a context-free language used in syntax analyzers. Context-free grammars used in the syntax analysis are
integrated with attributes (semantic rules):
❖ the result is a syntax-directed translation,
❖ Attribute grammars
For example,
newvalue := old_value + 20
The type of the identifier newvalue must match with type of the expression old_val+20
Intermediate Code Generator
A compiler may produce an explicit intermediate codes representing the source program. Intermediate code
generator takes a tree as an input produced by semantic analyzer and produces intermediate code (in assembly
language). The level of intermediate codes is close to the level of machine codes. For example,
* 1
old_value fact
id1 +
Explicit Parse tree
(Abstract Syntax Tree)
* 1
id2 id3
Source Code
Processing of
#include, #define, Language Preprocessor Trivial Errors
#ifdef, etc
Preprocessed Source Code
Lexical Analysis
Syntax Analysis Syntax Errors
Semantic Analysis
Program
Declarations Statements
Front end
Back end
Code
Fig. 1.10 Compiler consisting front-end and back-end