You are on page 1of 13

Chapter One

Introduction to Compiling

Introduction
The use of computer languages is an essential link in
the chain between human and computer.

-Translators for programming languages – the various


classes of translator (assemblers, compilers ,
interpreters)

- Compilers generators – tools that are available to help


automate the construction of translators for programming
languages.
Translators or compilers

Translators or compilers – programs which accept


(as data) a textual representation of an algorithm
expressed in a source language , and which produce
(as primary output)of the same algorithm expressed
in another language , the object or target language.

Phases in developing and using programs written in


high level languages:
-Compilation (compile-time)

-Execution(run-time)
A translator, being a program in its own right, must
itself be written in a computer language, known as
its host or implementation language.

Translators can be developed


-From scratch in machine language(rare to find
today)
For any new system one has to come to terms with the
machine language and machine architecture for that
system. (Disadvantage)
- In high-level languages
Translators for new machines are now invariably
developed often using the techniques of cross-compilation
and bootstrapping. (Advantage)
Languages involved in the development of
translators:
- The source language to be translated
- The object or target language to be generated
- The host language to be used for implementing
the translator.

The Fortran compilers- the first major translators


Developed by Backus and his colleagues at IBM in
the 1950s.
Classes of Translators
Assembler: translators that map low-level language
instructions into machine code which can then be
executed directly.

Compiler: translators that map high level languages


instructions into machine code which can then be
executed directly.

Decompiler: translators which attempt to take


object code at a low level and regenerate source
code at a higher level.
 
Any compilation can be broken down into two
major tasks:
Analysis: Discover the structure and primitives of
the source program, determining its meaning.

-concerns itself solely with the properties of the


source language
-converts the program text submitted by the
programmer into an abstract representation
structural analysis : determine the static structure
of the source program
semantic analysis : fix the additional information
and check its consistency.
Two subtasks of the structural analysis :
Lexical analysis: deals with the basic symbols of the
source program
-is described in terms of finite-state automata;

syntactic analysis or parsing: deals with the static


structure of the program
- is described in terms of pushdown automata(uses
a stack).

Synthesis: Create a target program equivalent to the


source program.
Two subtasks of the synthesis :
-code generation
-assembly.
Code generation: transforms the abstract source
program appearing at the analysis/synthesis
interface into an equivalent target machine
program.

Assembly resolves all target addressing and


converts the target machine instructions into an
appropriate output format.
Phases in Translation
The components of the translator that handle these
two major phases (analytic and synthetic) are said
to comprise the front end and the back end of the
compiler.
The front end is largely independent of the target
machine.
The back end depends very heavily on the target
machine.
Lexical analysis: This is the initial part of reading and
analysing the program text:
The text is read and divided into tokens.

The lexical analyser or scanner is the section that


fuses characters of the source text into groups that
logically make up the tokens of the language –
symbols like identifiers, strings, numeric constants
,key words, operators (like <=) and so on.
Fig. Phases of A Compiler
Lexical Analysis
A lexical analyser or scanner is a program that
groups sequences of characters into lexemes,
and outputs (to the syntax analyser) a sequence of
tokens.

Tokens are symbolic names for the entities that


make up the text of the program;
e.g. if for the keyword if, and id for any identifier.
These make up the output of the lexical analyser.
A pattern is a rule that specifies when a sequence of
characters from the input constitutes a token; e.g
the sequence i, f for the token if,
- any sequence of alphanumerics starting with a
letter for the token id.

A lexeme is a sequence of characters from the input


that match a pattern (and hence constitute an
instance of a token);
- for example : if matches the pattern for if,
- and foo123bar matches the pattern for id.

Whitespace (newlines, spaces and tabs), although


often important in separating lexemes, is usually
not returned as a token.

You might also like