Chapter One Introduction To Compiling: Link Between Human Computer

Chapter One
Introduction to Compiling
Introduction
The use of computer languages is an essential link in
the chain between human and computer.
-Translators for programming languages – the various

classes of translator (assemblers, compilers ,
interpreters)
- Compilers generators – tools that are available to help

automate the construction of translators for programming
languages.
Translators or compilers
Translators or compilers – programs which accept

(as data) a textual representation of an algorithm
expressed in a source language , and which produce
(as primary output)of the same algorithm expressed
in another language , the object or target language.
Phases in developing and using programs written in

high level languages:
-Compilation (compile-time)
-Execution(run-time)
A translator, being a program in its own right, must
itself be written in a computer language, known as
its host or implementation language.
Translators can be developed

-From scratch in machine language(rare to find
today)
For any new system one has to come to terms with the
machine language and machine architecture for that
system. (Disadvantage)
- In high-level languages
Translators for new machines are now invariably
developed often using the techniques of cross-compilation
and bootstrapping. (Advantage)
Languages involved in the development of
translators:
- The source language to be translated
- The object or target language to be generated
- The host language to be used for implementing
the translator.
The Fortran compilers- the first major translators

Developed by Backus and his colleagues at IBM in
the 1950s.
Classes of Translators
Assembler: translators that map low-level language
instructions into machine code which can then be
executed directly.
Compiler: translators that map high level languages

instructions into machine code which can then be
executed directly.
Decompiler: translators which attempt to take

object code at a low level and regenerate source
code at a higher level.

Any compilation can be broken down into two
major tasks:
Analysis: Discover the structure and primitives of
the source program, determining its meaning.
-concerns itself solely with the properties of the

source language
-converts the program text submitted by the
programmer into an abstract representation
structural analysis : determine the static structure
of the source program
semantic analysis : fix the additional information
and check its consistency.
Two subtasks of the structural analysis :
Lexical analysis: deals with the basic symbols of the
source program
-is described in terms of finite-state automata;
syntactic analysis or parsing: deals with the static

structure of the program
- is described in terms of pushdown automata(uses
a stack).
Synthesis: Create a target program equivalent to the

source program.
Two subtasks of the synthesis :
-code generation
-assembly.
Code generation: transforms the abstract source
program appearing at the analysis/synthesis
interface into an equivalent target machine
program.
Assembly resolves all target addressing and

converts the target machine instructions into an
appropriate output format.
Phases in Translation
The components of the translator that handle these
two major phases (analytic and synthetic) are said
to comprise the front end and the back end of the
compiler.
The front end is largely independent of the target
machine.
The back end depends very heavily on the target
machine.
Lexical analysis: This is the initial part of reading and
analysing the program text:
The text is read and divided into tokens.
The lexical analyser or scanner is the section that

fuses characters of the source text into groups that
logically make up the tokens of the language –
symbols like identifiers, strings, numeric constants
,key words, operators (like <=) and so on.
Fig. Phases of A Compiler
Lexical Analysis
A lexical analyser or scanner is a program that
groups sequences of characters into lexemes,
and outputs (to the syntax analyser) a sequence of
tokens.
Tokens are symbolic names for the entities that

make up the text of the program;
e.g. if for the keyword if, and id for any identifier.
These make up the output of the lexical analyser.
A pattern is a rule that specifies when a sequence of
characters from the input constitutes a token; e.g
the sequence i, f for the token if,
- any sequence of alphanumerics starting with a
letter for the token id.
A lexeme is a sequence of characters from the input

that match a pattern (and hence constitute an
instance of a token);
- for example : if matches the pattern for if,
- and foo123bar matches the pattern for id.
Whitespace (newlines, spaces and tabs), although

often important in separating lexemes, is usually
not returned as a token.

Chapter One Introduction To Compiling: Link Between Human Computer

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter One Introduction To Compiling: Link Between Human Computer

Uploaded by

Copyright:

Available Formats

Chapter One

-Translators for programming languages – the various

- Compilers generators – tools that are available to help

Translators or compilers – programs which accept

Phases in developing and using programs written in

Translators can be developed

The Fortran compilers- the first major translators

Compiler: translators that map high level languages

Decompiler: translators which attempt to take

-concerns itself solely with the properties of the

syntactic analysis or parsing: deals with the static

Synthesis: Create a target program equivalent to the

Assembly resolves all target addressing and

The lexical analyser or scanner is the section that

Tokens are symbolic names for the entities that

A lexeme is a sequence of characters from the input

Whitespace (newlines, spaces and tabs), although

You might also like