Professional Documents
Culture Documents
ROLL NO –18700220055
STREAM- IT
SEMESTER- 5th
COMPILER DESIGN
CONTENT
Introduction
Steps to Convert HLL To
Machine Code
Phases Of Compiler
Role of lexical analyzer
Lexemes, Patterns and
tokens
Types of tokens
Regular expression of
tokens
Finite Automata
INTRODUCTION
A compiler is a software that converts the source code to
the object code. In other words, we can say that it
converts the high-level language to machine/binary
language. Moreover, it is necessary to perform this step to
make the program executable. This is because the
computer understands only binary language.
Some compilers convert the high-level language to an
assembly language as an intermediate step. Whereas some
others convert it directly to machine code. This process of
converting the source code into machine code is
called compilation. Let us learn more about it in detail
STEPS TO CONVERT HLL TO MACHINE CODE
PHASES OF COMPILERS
PHASES EXPLANATION WITH EXAMPLES
ROLE OF LEXICAL ANALYZER
Lexical analyzer separates the characters of the source language into
groups that logically belong together, called tokens. It includes the
token name which is an abstract symbol that define a type of lexical
unit and an optional attribute value called token values. Tokens can
be identifiers, keywords, constants, operators, and punctuation
symbols including commas and parenthesis. A rule that represent a
group of input strings for which the equal token is make as output is
called the pattern
The lexical analyzer also handles issues including stripping out the
comments and whitespace (tab, newline, blank, and other characters
that are used to separate tokens in the input). The correlating error
messages that are generated by the compiler during lexical analyzer
with the source program.
For example, it can maintain track of all newline characters so that it
can relate an ambiguous statement line number with each error
message. It can be implementing the expansion of macros, in the case
of macro, pre-processors are used in the source program.
LEXEMES, PATTERNS AND TOKENS
Lexemes- A lexeme is an actual character sequence
forming a specific instance of a token, such as numbers.
A lexeme is a sequence of characters in the source text
that is matched by the pattern for a token
Patterns – A rule that describes the set of strings
associated to a token. Expressed as a regular expression
and describing how a particular token can be formed
The pattern matches each string in the set
Digits – digit(digit)*
Letter – A|B|…….|Z|A|……..|Z|
Identifiers – (letter).(letter|digit)*
FINITE AUTOMATA
Finite automata is a state machine that takes a string of symbols as input and changes
its state accordingly. Finite automata is a recognizer for regular expressions. When a
regular expression string is fed into finite automata, it changes its state for each
literal. If the input string is successfully processed and the automata reaches its final
state, it is accepted, i.e., the string just fed was said to be a valid token of the
language in hand.
The mathematical model of finite automata consists of:
Finite set of states (Q)
Finite set of input symbols (Σ)
One Start state (q0)
Set of final states (qf)
Transition function (δ)
The transition function (δ) maps the finite set of state (Q) to a finite set of input
symbols (Σ), Q × Σ ➔ Q
THANK YOU