Professional Documents
Culture Documents
1 Introduction to Compiling
1.1 Compilers
A compiler is a program that reads a program written in one language –the source language and translates it into an
equivalent program in another language-the target language. As an important part of this translation process, the
compiler reports to its user the presence of errors in the source program.
The Analysis and Synthesis Model of compilation
There are two parts of compilation: analysis and synthesis. The analysis part breaks up the source program into
constituent pieces and create and intermediate representation of the source program. The synthesis part constructs the
desired target program from the intermediate representation. During analysis the operations implied by the source
program are determines and recorded in a hierarchical structure called a tree.
1.2 Analysis of the source program
1. Linear Analysis/Lexical Analysis: in which the stream of characters making up the source program is read from
left to right and grouped into tokens that are sequences of characters having a collective meaning. It produces
pairs- a word and its part of speech. For example the input x=x+y becomes
<id, x>
<assign, =>
<id, x>
<op, +>
<id,y> we call the pair”<token type,word>” a token. Typical tokens are number, identifier, +,-,new, while, if etc.
Blanks separating the characters of tokens would normally be eliminated during lexical analysis.
2. Hierarchical Analysis/Parsing/syntax analysis: in which characters or tokens are grouped hierarchically into
nested collections with collective meaning. It involves grouping the tokens of the source program into
grammatical phrases that are used by the compiler to synthesize output.
The syntax of most programming languages is specified using CFG or BNF (Backus Naur Form). CFG is specified
with a grammar G=(S,N,T,P)
S is the start symbol
N is a set of non-terminal
T is set of terminal symbols or words
P is asset of productions or rewrite rules.
Example:
1) goalexpr // S=goal
2) expr expr op term | term // T={number, id, +, -}
3) termnumber |id // N={goal,expr, term,op}
4) op+ |- // P={1,2,3,4,5,6,7}
Given a CFG we can derive senetnces by repeated substitution. Consider the sentence x+2-y
To recognize a valid sentence in some CFG, we reverse this process and build up a parse, thus the name parser. (Parse is
to break up a sentence or group of words into separate components). A parse can be represented by a tree called parse
tree.
3. Semantic Analysis: in which certain checks are performed to ensure that the components of a program fit
together meaningfully. It checks the source program for semantic errors and gathers type information for the
subsequent code-generation phase. An important component of semantic analysis is type checking. For
example, many programming language definitions require a compiler to report an error every time a real
number is used to index an array.
1.3 Phases of compiler