Professional Documents
Culture Documents
Must Know!
MetroOpinion
Open
Earn $3.5 Per Answer
What is a Compiler?
A compiler is a computer program which helps you transform source
code written in a high-level language into low-level machine language. It
translates the code written in one programming language to some other
language without changing the meaning of the code. The compiler also
makes the end code e!icient, which is optimized for execution time and
memory space.
Features of Compilers
Correctness
Speed of compilation
Preserve the correct the meaning of the code
The speed of the target code
Recognize legal and illegal program constructs
Good error reporting/handling
Code debugging help
Types of Compiler
Following are the di!erent types of Compiler:
Faster Payments
Simplify payment for your
customers by sharing your… Start Now
!u"erwave.com
The Two pass compiler method also simplifies the retargeting process. It
also allows multiple front ends.
Multipass Compilers
Multipass Compilers
Tasks of Compiler
The main tasks performed by the Compiler are:
History of Compiler
Important Landmark of Compiler’s history is as follows:
The “compiler” word was first used in the early 1950s by Grace
Murray Hopper.
The first compiler was build by John Backum and his group
between 1954 and 1957 at IBM.
COBOL was the first programming language which was compiled on
multiple platforms in 1960
The study of the scanning and parsing issues were pursued in the
1960s and 1970s to provide a complete solution.
Preprocessor
Preprocessor: The preprocessor is considered as a part of the
Compiler. It is a tool which produces input for Compiler. It deals
with macro processing, augmentation, language extension, etc.
Interpreter
Interpreter: An interpreter is like Compiler which translates high-
level language into low-level machine language. The main
di!erence between both is that interpreter reads and transforms
code line by line. Compiler reads the entire code at once and creates
the machine code.
Assembler
Assembler: It translates assembly language code into machine
understandable language. The output result of assembler is known
as an object file which is a combination of machine instruction as
well as the data required to store these instructions in memory.
Linker
Linker: The linker helps you to link and merge various object files to
create an executable file. All these files might have been compiled
with separate assemblers. The main task of a linker is to search for
called modules in a program and to find out the memory location
where all modules are stored.
Loader
Loader: The loader is a part of the OS, which performs the tasks of
loading executable files into memory and run them. It also
calculates the size of a program which creates additional memory
space.
Cross-compiler
Cross-compiler: A Cross compiler in compiler design is a platform
which helps you to generate executable code.
Source-to-source Compiler
Compiler: Source to source compiler is a term
used when the source code of one programming language is
translated into the source of another language.
Scanner generators
generators: This tool takes regular expressions as input.
For example LEX for Unix Operating System.
Syntax-directed translation engines
engines: These so"ware tools o!er
an intermediate code by using the parse tree. It has a goal of
associating one or more translations with each node of the parse
tree.
Parser generators: A parser generator takes a grammar as input
and automatically generates source code which can parse streams
of characters with the help of a grammar.
Automatic code generators
generators: Takes intermediate code and
converts them into Machine Language.
Data-flow engines
engines: This tool is helpful for code optimization. Here,
information is supplied by the user, and intermediate code is
compared to analyze any relation. It is also known as data-flow
analysis. It helps you to find out how values are transmitted from
one part of the program to another part.
Application of Compilers
Compiler design helps full implementation Of High-Level
Programming Languages.
Support optimization for Computer Architecture Parallelism.
Design of New Memory Hierarchies of Machines.
Widely used for Translating Programs.
Used with other So"ware Productivity Tools.
Summary
A compiler is a computer program which helps you transform
source code written in a high-level language into low-level machine
language.
Correctness, speed of compilation, preserve the correct the
meaning of the code are some important features of compiler
design.
Compilers are divided into three parts 1) Single Pass Compilers
2)Two Pass Compilers, and 3) Multipass Compilers.
The “compiler” was word first used in the early 1950s by Grace
Murray Hopper.
Steps for Language processing system are: Preprocessor,
Interpreter, Assembler, Linker/Loader.
Important compiler construction tools are 1) Scanner generators,
2)Syntax-3) directed translation engines, 4) Parser generators, 5)
Automatic code generators.
The main task of the compiler is to verify the entire program, so
there are no syntax or semantic errors.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generator
5. Code optimizer
6. Code generator
Phases of Compiler
All these phases convert the source code by dividing into tokens,
creating parse trees, and optimizing the source code by di!erent phases.
Example
Example:
x = y + 10
Tokens
X identifier
= Assignment operator
Y identifier
+ Addition operator
10 Number
Phase 2: Syntax Analysis
Syntax analysis is all about discovering structure in code. It determines
whether or not a text follows the expected format. The main aim of this
phase is to make sure that the source code was written by the
programmer is correct or not.
Syntax analysis is based on the rules based on the specific programing
language by constructing the parse tree with the help of tokens. It also
determines the structure of source language and grammar or syntax of
the language.
Here, is a list of tasks performed in this phase:
Example
Any identifier/number is an expression
If x is an identifier and y+10 is an expression, then x= y+10 is a statement.
Consider parse tree for the following example
(a+b)*c
In Parse Tree
Interior node: record with an operator filed and two files for children
Leaf: records with 2/more fields; one for token and other
information about the token
Ensure that the components of the program fit together
meaningfully
Gathers type information and checks for type compatibility
Checks operands are permitted by the source language
Phase 3: Semantic Analysis
Semantic analysis checks the semantic consistency of the code. It uses
the syntax tree of the previous phase along with the symbol table to
verify that the given source code is semantically consistent. It also
checks whether the code is conveying an appropriate meaning.
Semantic Analyzer will check for Type mismatches, incompatible
operands, a function called with improper arguments, an undeclared
variable, etc.
Functions of Semantic analyses phase are:
Example
float x = 20.2;
float y = x*30;
In the above code, the semantic analyzer will typecast the integer 30 to
float 30.0 before multiplication
Example
For example,
t1 := int_to_float(5)
t2 := rate * t1
t3 := count + t2
total := t3
Phase 5: Code Optimization
The next phase of is code optimization or Intermediate code. This phase
removes unnecessary code line and arranges the sequence of statements
to speed up the execution of the program without wasting resources.
The main goal of this phase is to improve on the intermediate code to
generate a code that runs faster and occupies less space.
The primary functions of this phase are:
Example:
Consider the following code
a = intofloat(10)
b = c * a
d = e + b
f = d
Can become
b =c * 10.0
f = e+b
Example:
a = b + 60.0
Would be possibly translated to registers.
MOVF a, R1
MULF #60.0, R2
ADDF R1, R2
Summary
Compiler operates in various phases each phase transforms the
source program from one representation to another
Six phases of compiler design are 1) Lexical analysis 2) Syntax
analysis 3) Semantic analysis 4) Intermediate code generator 5)
Code optimizer 6) Code Generator
Lexical Analysis is the first phase when compiler scans the source
code
Syntax analysis is all about discovering structure in text
Semantic analysis checks the semantic consistency of the code
Once the semantic analysis phase is over the compiler, generate
intermediate code for the target machine
Code optimization phase removes unnecessary code line and
arranges the sequence of statements
Code generation phase gets inputs from code optimization phase
and produces the page code or object code as a result
A symbol table contains a record for each identifier with fields for
the attributes of the identifier
Error handling routine handles error and reports during many
phases
About
About Us
Advertise with
Us
Python Testing Hacking
Write For Us
Contact Us
Career
Suggestion
SAP Career
Suggestion
Tool
So!ware
Testing as a SAP Java SQL
Career
Interesting
eBook
Blog
Quiz
Quiz
SAP eBook
Execute
online
Execute Java
Online
Execute
Javascript
Execute HTML Selenium Build Website VPNs
Execute
Python
What is Lexical
Analysis?
Lexical Analysis is the very
first phase in the compiler
designing. A Lexer takes the
modified source code which is
written in the form of
sentences . In other words, it
helps you to convert a
sequence of characters into a sequence of tokens. The lexical analyzer
breaks this syntax into a series of tokens. It removes any extra space or
comment written in the source code.
Example
Now, check this example, we can also read this. However, it will take
some time because separators are put in the Odd Places. It is not
something which comes to you immediately.
Basic Terminologies:
Lexical Analyzer Architecture: How tokens are recognized
Roles of the Lexical analyzer
Lexical Errors
Error Recovery in Lexical Analyzer
Lexical Analyzer vs. Parser
Why separate Lexical and Parser?
Advantages of Lexical analysis
Disadvantage of Lexical analysis
Basic Terminologies
What’s a lexeme?
A lexeme is a sequence of characters that are included in the source
program according to the matching pattern of a token. It is nothing but
an instance of a token.
What’s a token?
Tokens in compiler design are the sequence of characters which
represents a unit of information in the source program.
What is Pattern?
A pattern is a description which is used by the token. In the case of a
keyword which uses as a token, the pattern is a sequence of characters.
Lexical analyzer scans the entire source code of the program. It identifies
each token one by one. Scanners are usually implemented to produce
tokens only when requested by a parser. Here is how recognition of
tokens in compiler design works-
Lexical Analyzer Architecture
1. “Get next token” is a command which is sent from the parser to the
lexical analyzer.
2. On receiving this command, the lexical analyzer scans the input
until it finds the next token.
3. It returns the token to Parser.
MetroOpinion
int Keyword
maximum Identifier
( Operator
int Keyword
x Identifier
, Operator
int Keyword
Y Identifier
) Operator
{ Operator
If Keyword
Examples of Nontokens
Type Examples
Macro NUMS
Whitespace /n /b /t
Lexical Errors
A character sequence which is not possible to scan into any valid token is
a lexical error. Important facts about the lexical error:
Summary
Lexical analysis is the very first phase in the compiler design
Lexemes and Tokens are the sequence of characters that are
included in the source program according to the matching pattern
of a token
Lexical analyzer is implemented to scan the entire source code of
the program
Lexical analyzer helps to identify token into the symbol table
A character sequence which is not possible to scan into any valid
token is a lexical error
Removes one character from the remaining input is useful Error
recovery method
Lexical Analyser scan the input program while parser perform
syntax analysis
It eases the process of lexical analysis and the syntax analysis by
eliminating unwanted tokens
Lexical analyzer is used by web browsers to format and display a
web page with the help of parsed data from JavsScript, HTML, CSS
The biggest drawback of using Lexical analyzer is that it needs
additional runtime overhead is required to generate the lexer tables
and construct the tokens
What is Syntax
Analysis?
Syntax Analysis is a second
phase of the compiler design
process in which the given
input string is checked for the
confirmation of rules and
structure of the formal
grammar. It analyses the
syntactical structure and checks if the given input is in the correct syntax
of the programming language or not.
A parse also checks that the input string is well-formed, and if not, reject
it.
Following are important tasks perform by the parser in compiler design:
Top-Down Parsing,
Bottom-Up Parsing
Top-Down Parsing:
In the top-down parsing construction of the parse tree starts at the root
and then proceeds towards the leaves.
1. Predictive Parsing:
This parsing technique recursively parses the input to make a prase tree.
It consists of several small functions, one for each nonterminal in the
grammar.
Bottom-Up Parsing:
In the bottom up parsing in compiler design, the construction of the
parse tree starts with the leave, and then it processes towards its root. It
is also called as shi"-reduce parsing. This type of parsing in compiler
design is created with the help of using some so"ware tools.
Lexical
Lexical: Name of an incorrectly typed identifier
Syntactical
Syntactical: unbalanced parenthesis or a missing semicolon
Semantical
Semantical: incompatible value assignment
Logical
Logical: Infinite loop and not reachable code
A parser should able to detect and report any error found in the program.
So, whenever an error occurred the parser. It should be able to handle it
and carry on parsing the remaining input. A program can have following
types of errors at various compilation process stages. There are five
common error-recovery methods which can be implemented in the
parser
In the case when the parser encounters an error, it helps you to take
corrective steps. This allows rest of inputs and states to parse
ahead.
For example, adding a missing semicolon is comes in statement
mode recover method. However, parse designer need to be careful
while making these changes as one wrong correction may lead to an
infinite loop.
Panic-Mode recovery
In the case when the parser encounters an error, this mode ignores
the rest of the statement and not process input from erroneous
input to delimiter, like a semi-colon. This is a simple error recovery
method.
In this type of recovery method, the parser rejects input symbols
one by one until a single designated group of synchronizing tokens
is found. The synchronizing tokens generally using delimiters like or.
Phrase-Level Recovery:
Compiler corrects the program by inserting or deleting tokens. This
allows it to proceed to parse from where it was. It performs
correction on the remaining input. It can replace a prefix of the
remaining input with some string this helps the parser to continue
the process.
Error Productions
Error production recovery expands the grammar for the language
which generates the erroneous constructs. The parser then
performs error diagnostic about that construct.
Global Correction:
Notational Conventions
Notational conventions symbol may be indicated by enclosing the
element in square brackets. It is an arbitrary sequence of instances of the
element which can be indicated by enclosing the element in braces
followed by an asterisk symbol, { … }*.
It is a choice of the alternative which may use the symbol within the
single rule. It may be enclosed by parenthesis ([,] ) when needed.
1.Terminals:
2.Nonterminals:
Grammar Derivation
Grammar derivation is a sequence of grammar rule which transforms the
start symbol into the string. A derivation proves that the string belongs to
the grammar’s language.
Le"-most Derivation
When the sentential form of input is scanned and replaced in le" to right
sequence, it is known as le"-most derivation. The sentential form which
is derived by the le"-most derivation is called the le"-sentential form.
Right-most Derivation
Rightmost derivation scan and replace the input with production rules,
from right to le", sequence. It’s known as right-most derivation. The
sentential form which is derived from the rightmost derivation is known
as right-sentential form.
It is responsible for
It receives inputs, in the form of tokens, from the validity of a token
lexical analysers. supplied by
Summary
Syntax analysis is a second phase of the compiler design process
that comes a"er lexical analysis
The syntactical analyser helps you to apply rules to the code
Sentence, Lexeme, Token, Keywords and reserved words, Noise
words, Comments, Delimiters, Character set, Identifiers are some
important terms used in the Syntax Analysis in Compiler
construction
Parse checks that the input string is well-formed, and if not, reject it
Parsing techniques are divided into two di!erent groups: Top-Down
Parsing, Bottom-Up Parsing
Lexical, Syntactical, Semantical, and logical are some common
errors occurs during parsing method
A grammar is a set of structural rules which describe a language
Notational conventions symbol may be indicated by enclosing the
element in square brackets
A CFG is a le"-recursive grammar that has at least one production of
the type
Grammar derivation is a sequence of grammar rule which
transforms the start symbol into the string
The syntax analyser mainly deals with recursive constructs of the
language while the lexical analyser eases the task of the syntax
analyser in DBMS
The drawback of Syntax analyser method is that it will never
determine if a token is valid or not
Table of Content:
What is Compiler?
A compiler is a computer program that transforms code written in a high-
level programming language into the machine code. It is a program
which translates the human-readable code to a language a computer
processor understands (binary 1 and 0 bits). The computer processes the
machine code to perform the corresponding tasks.
A compiler should comply with the syntax rule of that programming
language in which it is written. However, the compiler is only a program
and can not fix errors found in that program. So, if you make a mistake,
you need to make changes in the syntax of your program. Otherwise, it
will not compile.
What is Interpreter?
An interpreter is a computer program, which converts each high-level
program statement into the machine code. This includes source code,
pre-compiled code, and scripts. Both compiler and interpreters do the
same job which is converting higher level programming language to
machine code. However, a compiler will convert the code into machine
code (create an exe) before program run. Interpreters convert code into
machine code when the program is run.
Basis of
Compiler Interpreter
di!erence
Create the
program.
Compile will
parse or
analyses all of
the language
statements for
its correctness.
If incorrect, Create the Program
throws an error No linking of files or
Programming If no error, the machine code generation
Steps compiler will Source statements executed
convert source line by line DURING
code to Execution
machine code.
It links
di!erent code
files into a
runnable
program(know
as exe)
Run the
Program
Store machine
Machine language as
Not saving machine code at all.
code machine code on
the disk
Generates output
program (in the
Do not generate output program.
form of exe) which
Program So they evaluate the source
can be run
generation program at every time during
independently from
execution.
the original
program.
Program execution
is separate from the
compilation. It Program Execution is a part of
Execution performed only Interpretation process, so it is
a"er the entire performed line by line.
output program is
compiled.
Target program
execute
Memory independently and The interpreter exists in the
requirement do not require the memory during interpretation.
compiler in the
memory.
Bounded to the
specific target For web environments, where
machine and load times are important. Due to
cannot be ported. C all the exhaustive analysis is
Best suited and C++ are a most done, compiles take relatively
for popular larger time to compile even
programming small code that may not be run
language which multiple times. In such cases,
uses compilation interpreters are better.
model.
Di!icult to
implement as
Dynamic compilers cannot Interpreted languages support
Typing predict what Dynamic Typing
happens at turn
time.
Compiler displays
all errors and
warning at the The interpreter reads a single
Error compilation time. statement and shows the error if
execution Therefore, you can’t any. You must correct the error
run the program to interpret next line.
without fixing
errors
MetroOpinion
Role of Compiler
Compliers reads the source code, outputs executable code
Translates so"ware written in a higher-level language into
instructions that computer can understand. It converts the text that
a programmer writes into a format the CPU can understand.
The process of compilation is relatively complicated. It spends a lot
of time analyzing and processing the program.
The executable result is some form of machine-specific binary code.
Role of Interpreter
The interpreter converts the source code line-by-line during RUN
Time.
Interpret completely translates a program written in a high-level
language into machine level language.
Interpreter allows evaluation and modification of the program while
it is executing.
Relatively less time spent for analyzing and processing the program
Program execution is relatively slow compared to compiler
HIGH-LEVEL LANGUAGES
High-level languages, like C, C++, JAVA, etc., are very near to English. It
makes programming process easy. However, it must be translated into
machine language before execution. This translation process is either
conducted by either a compiler or an interpreter. Also known as source
code.
MACHINE CODE
Machine languages are very close to the hardware. Every computer has
its machine language. A machine language programs are made up of
series of binary pattern. (Eg. 110110) It represents the simple operations
which should be performed by the computer. Machine language
programs are executable so that they can be run directly.
OBJECT CODE
On compilation of source code, the machine code generated for di!erent
processors like Intel, AMD, and ARM is di!erent. To make code portable,
the source code is first converted to Object Code. It is an intermediary
code (similar to machine code) that no processor will understand. At run
time, the object code is converted to the machine code of the underlying
platform.
Also Check:- Java Tutorial for Beginners: Learn Core Java Programming