1.
Peephole optimization is a type of compiler optimization technique that
involves analyzing a small section of code, called a "peephole," and making
local changes to improve the efficiency or performance of the code. The
peephole optimization technique is typically applied after more general
optimizations have been performed, and it is intended to fine-tune the code by
making small, targeted changes.
Peephole optimization can involve a number of different techniques, but some
of the most common include:
1. Constant folding: This involves evaluating expressions that contain only
constants at compile time, and replacing the expression with the result of the
evaluation. For example, the expression "2+2" could be replaced with "4".
2. Strength reduction: This involves replacing expensive operations, such as
multiplication or division, with less expensive operations, such as addition or
subtraction. For example, the expression "x*2" could be replaced with "x+x".
3. Dead code elimination: This involves removing code that is never executed,
such as code that follows an unconditional "return" statement.
4. Loop unrolling: This involves duplicating the code inside a loop, in order to
reduce the number of iterations required to execute the loop.
5. Instruction combining: This involves combining two or more instructions into
a single instruction, in order to reduce the number of instructions required to
execute the program.
Overall, the goal of peephole optimization is to make small, targeted changes
to the code that can have a big impact on its efficiency and performance. By
analyzing the code at a local level and making targeted optimizations,
compilers can produce more efficient code that runs faster and uses fewer
resources.
Unit-4
1. Define compiler. Describe the logical phases of a compiler with a neat
sketch, show the output of each phase, using the example of the following
statement
position := initial + rate * 60
Ans)
A compiler is a software program that transforms source code written in a
programming language into machine code that can be executed by a computer.
The logical phases of a compiler include:
1. Lexical Analysis: This phase involves breaking down the source code into a
sequence of tokens, where each token represents a valid unit of the
programming language. For example, in the statement "position := initial + rate
* 60", the tokens would be "position", ":=", "initial", "+", "rate", "*", and "60".
The output of this phase is a stream of tokens.
2. Syntax Analysis: This phase involves checking the tokens produced in the
previous phase against the grammar rules of the programming language to
ensure that they form a valid expression or statement. This phase creates a
parse tree or an abstract syntax tree (AST) representing the statement. For
example, the parse tree for the statement "position := initial + rate * 60" would
look like the following:
```
:=
/ \
position +
/ \
initial *
/\
rate 60
```
3. Semantic Analysis: This phase involves checking the meaning of the
program and whether it conforms to the language specifications. This includes
type checking, name resolution, and the detection of semantic errors. For
example, if the variable "rate" was not previously defined in the program, a
semantic error would be raised.
4. Intermediate Code Generation: This phase involves transforming the AST
produced in the syntax analysis phase into an intermediate representation.
This intermediate code is generally platform-independent and is used to
optimize the code and make it more efficient.
5. Code Optimization: This phase involves optimizing the intermediate code to
improve the performance and efficiency of the final executable code. This
includes dead code elimination, constant folding, and loop unrolling.
6. Code Generation: This is the final phase of the compiler process. In this
phase, the optimized intermediate code is translated into machine code for the
target platform. The output of this phase is an executable file that can be run
on the target machine.
The output of each phase of the compiler process for the statement
"position := initial + rate * 60" would be as follows:
1. Lexical Analysis:
```
Token Stream:
position, :=, initial, +, rate, *, 60
```
2. Syntax Analysis:
```
AST:
:=
/ \
position +
/ \
initial *
/\
rate 60
```
3. Semantic Analysis: No errors.
4. Intermediate Code Generation:
```
Three-address code:
t1 = rate * 60
t2 = initial + t1
position = t2
```
5. Code Optimization: No optimizations.
6. Code Generation:
```
Machine Code:
LOAD rate
MUL 60
STORE t1
LOAD initial
ADD t1
STORE position
```
2) Explain the chief functions of lexical analysis phase.
Ans)The main function of the lexical analysis phase, also known as the scanner
or tokenizer, is to break down the input source code into a sequence of tokens.
Tokens are the smallest units of meaning in a programming language and are
used to represent keywords, identifiers, operators, literals, and other elements
of the language.
Here are some of the chief functions of the lexical analysis phase:
1. Tokenization: The lexical analyzer scans the source code character by
character and groups them into tokens. Each token represents a specific unit
of meaning, such as a keyword, identifier, operator, or literal.
2. Removal of whitespace and comments: The lexical analyzer removes
whitespace characters, such as spaces and tabs, and comments from the
source code before tokenization. This makes the source code more compact
and easier to process.
3. Error handling: The lexical analyzer detects and reports lexical errors, such
as misspelled keywords or unrecognized characters, to the user.
4. Symbol table management: The lexical analyzer maintains a symbol table
that stores information about the identifiers used in the program, such as their
names, types, and memory locations.
5. Code optimization: The lexical analyzer can perform some simple
optimizations, such as combining consecutive tokens that represent
multi-character operators into a single token, to make the subsequent phases
of the compiler more efficient.
Overall, the lexical analysis phase is an important component of the compiler
process that plays a critical role in transforming the input source code into a
form that can be easily processed by the subsequent phases of the compiler.
3. What is the role of transition diagrams in the construction of lexical analyzer?
Ans) Transition diagrams, also known as finite state machines, are commonly
used in the construction of lexical analyzers. A transition diagram is a
graphical representation of a finite state machine that shows how the machine
moves from one state to another in response to input symbols.
The role of transition diagrams in the construction of lexical analyzers is as
follows:
1. Specification of token patterns: The lexical analyzer is designed to recognize
specific patterns of characters in the input stream and convert them into
tokens. Transition diagrams can be used to specify the regular expressions
that define these patterns, such as identifiers, literals, and operators.
2. Construction of finite state machine: The transition diagrams are then used
to construct a finite state machine that implements the regular expressions
specified in the previous step. Each state in the machine corresponds to a
specific pattern that the lexical analyzer can recognize.
3. Generation of code: Once the finite state machine has been constructed, the
next step is to generate the code that implements the lexical analyzer. This
code typically consists of a loop that reads input characters one by one and
uses the finite state machine to recognize the corresponding tokens.
4. Testing and debugging: Finally, the lexical analyzer is tested and debugged
to ensure that it correctly recognizes all the tokens in the input stream and
handles errors and edge cases gracefully. The transition diagrams can be
useful in this process, as they provide a visual representation of the state
machine and help developers identify potential issues or areas for
improvement.
In summary, transition diagrams are an important tool in the construction of
lexical analyzers, as they help developers specify token patterns, construct
finite state machines, generate code, and test and debug the analyzer.
4. How a finite automaton is used to represent tokens and perform lexical analysis with
examples.
Ans) A finite automaton, also known as a finite state machine, is a
mathematical model used to recognize regular languages, such as those
defined by the regular expressions used to specify tokens in programming
languages. In the context of lexical analysis, a finite automaton can be used to
represent the different types of tokens in a program and to recognize these
tokens in the input stream.
Here's an example of how a finite automaton can be used to recognize
identifiers in a programming language:
1. Define the regular expression for identifiers: In most programming
languages, an identifier is a sequence of letters, digits, and underscores that
starts with a letter or underscore. The regular expression for identifiers can be
expressed as [a-zA-Z_][a-zA-Z0-9_]*.
2. Construct the finite automaton: The finite automaton for this regular
expression consists of a series of states, each of which corresponds to a
specific character or group of characters in the input stream. The automaton
starts in an initial state and transitions to a new state for each character it
reads from the input stream.
3. Label the states: In the case of the identifier regular expression, we can label
the states based on the current position in the identifier. For example, the
initial state can be labeled "start", the next state can be labeled
"letter_or_underscore", and subsequent states can be labeled
"letter_or_digit_or_underscore".
4. Recognize tokens: As the automaton reads characters from the input stream,
it moves from one state to another based on the current input and the current
state. When it reaches an accepting state, it recognizes the corresponding
token, in this case an identifier.
In these examples, the finite automaton is used to represent the regular
expression for a specific type of token and to recognize instances of that token
in the input stream. This is an essential part of the lexical analysis phase of a
compiler, which converts the input source code into a sequence of tokens that
can be further processed by the compiler.
5. Differentiate between token, lexeme and pattern with examples
Ans)
6. Explain the recognition of keywords and identifiers with a suitable transition
diagram.
Ans) In lexical analysis, keywords and identifiers are two important types of tokens
that need to be recognized in the source code. Here's a brief explanation of each and a
suitable transition diagram to illustrate their recognition:
1. Keywords: Keywords are reserved words in a programming language that have a
specific meaning and cannot be used as identifiers. Examples of keywords in many
programming languages include "if", "while", "for", "int", "float", and "void". To
recognize keywords, we can use a simple deterministic finite automaton (DFA) with a
separate state for each keyword. Here's an example of a transition diagram for the
keyword "if":
```
i f
--> q0 ---> q1 ---> q2*
```
In this diagram, `q0` is the initial state, and `q2` is the final (accepting) state. When the
DFA receives an input character "i", it transitions to state `q1`. If the next input
character is "f", the DFA transitions to the final state `q2`, which indicates that the
input characters "if" form a keyword token.
2. Identifiers: Identifiers are user-defined names that represent variables, functions, or
other entities in a program. Examples of identifiers include variable names like
"count", function names like "calculate_sum", and class names like "Student". To
recognize identifiers, we can use a more complex DFA that accepts any sequence of
letters, digits, and underscores, as long as the first character is a letter. Here's an
example of a transition diagram for identifiers:
```
letter digit _
--> q0 ---> q1 ---> q2 ---> q3*
```
In this diagram, `q0` is the initial state, and `q3` is the final (accepting) state. When the
DFA receives an input character that is a letter, it transitions to state `q1`. If the next
input character is a digit or an underscore, the DFA transitions to state `q2`. From
state `q2`, the DFA can transition to itself or back to `q1` on subsequent letters, digits,
or underscores. When the DFA encounters a non-letter, non-digit, non-underscore
character, it transitions to the final state `q3`, which indicates that the input characters
form an identifier token.
By using these transition diagrams, we can effectively recognize and categorize
keywords and identifiers in the source code during lexical analysis.
9. Write an algorithm to find LR(0) items and give an example
Ans) Here is an algorithm to find LR(0) items:
Input: A grammar G in augmented form, the start symbol S', and a set of LR(0) items
I.
Output: A set of LR(0) items I'.
1. Add the item S' -> .S to I.
2. For each item A -> alpha . B beta in I and for each production B -> gamma in G, add
the item B -> .gamma to I.
3. Repeat step 2 until no new items can be added to I.
4. Return I'.
Let's apply this algorithm to an example grammar:
```
S' -> S
S -> AaB
A -> a
B -> b
```
First, we add the initial item to I:
```
I0 = { S' -> .S }
```
Then, we apply step 2 to add new items:
```
I0 = { S' -> .S }
I1 = { S' -> S. }
I2 = { S -> .AaB }
I3 = { A -> .a }
I4 = { B -> .b }
```
Now we have all the LR(0) items for this grammar:
```
I0 = { S' -> .S }
I1 = { S' -> S. }
I2 = { S -> .AaB }
I3 = { A -> .a }
I4 = { B -> .b }
I5 = { S -> A.aB }
I6 = { S -> Aa.B }
```
Each item represents a possible configuration of the parser's state as it parses the input.
10. Define LR(k) parser. Draw and explain the model of LR parser.
Ans) An LR(k) parser is a bottom-up parser that uses a look-ahead of k symbols to
parse a string of tokens. The LR(k) parser is more powerful than the LL(k) parser and
can handle a larger class of grammars. The name LR comes from Left-to-right parsing
and Right-most derivation.
The model of an LR parser consists of three main components: a lexer, a parsing table,
and a stack. The lexer reads the input string and produces a sequence of tokens, which
are then used by the parser. The parsing table is a table that contains the transitions of
the parser. The stack is used to keep track of the parser's current state.
Here is a high-level overview of the LR parser model:
1. Start in the initial state, which is usually state 0.
2. Read the next input token from the lexer.
3. Look up the current state and input token in the parsing table.
4. If the table entry is a shift action, push the input token onto the stack and transition
to the state indicated in the table entry.
5. If the table entry is a reduce action, pop the symbols for the right-hand side of the
production from the stack, and push the left-hand side symbol onto the stack.
Transition to the state indicated in the GOTO part of the table entry.
6. If the table entry is an accept action, the input is accepted and parsing is complete.
7. If the table entry is an error action, report an error and halt.
To understand the LR parser model more clearly, let's consider an example LR(1)
parsing table:
Here, the LR(1) parser is using a look-ahead of one symbol. Each row of the table
represents a state of the parser, and each column represents a possible input symbol.
The Action column shows the action to take when the input symbol is encountered in
the current state, and the GOTO column shows the state to transition to after a
reduction.
For example, suppose we have the input string "ab$". The first input symbol is "a",
and the parser starts in state 0. The table entry for state 0 and input symbol "a" is
"shift, s2". This means the parser should shift the input symbol onto the stack and
transition to state 2.
Now the input symbol is "b", and the parser is in state 2. The table entry for state 2 and
input symbol "b" is "shift, s5". This means the parser should shift the input symbol
onto the stack and transition to state 5.
The next input symbol is "$", and the parser is in state 5. The table entry for state 5
and input symbol "$" is "reduce, B -> b".
10. State and explain the rules used to construct the LR(1) items.
Ans) LR(1) items are used in constructing the parsing table for an LR(1) parser. An
LR(1) item is a production rule with a dot (.) inserted at various positions within the
rule, along with a lookahead symbol that indicates what token can follow the rule in the
input stream. The LR(1) items are constructed using the following rules:
1. Start with the augmented grammar and create the initial item, which consists of the
start symbol followed by a dot and the lookahead symbol $.
- Example: S' -> .S, $
2. For each item that has a dot before a nonterminal symbol, create a new item for each
production rule of that nonterminal, with the dot at the beginning of the rule and the
lookahead symbol from the original item.
- Example: S -> .AB, a => A -> .a, a
3. For each item that has a dot before a terminal symbol, create a new item with the dot
moved one position to the right and the same lookahead symbol as the original item.
- Example: A -> a. , b => A -> a., b
4. For each item that has a dot at the end of the rule, create a new item with the same
rule and lookahead symbol as the original item.
- Example: A -> a. , b => no new items created
5. For each item that has a dot before a nonterminal symbol and a lookahead symbol,
merge it with any other items that have the same nonterminal and lookahead symbol.
- Example: A -> .aB, b and A -> .aC, b => A -> .aB, b C -> .c, b
6. Repeat steps 2-5 until no new items can be created.
The resulting set of LR(1) items can be used to construct the LR(1) parsing table. Each
LR(1) item corresponds to a state in the parsing table, and the lookahead symbols
indicate the transitions between the states. The parsing table can be constructed by
computing the closure and goto sets for each LR(1) item, which are used to determine
the actions to take when parsing the input string.
12. Differentiate Top Down parsing and Bottom Up Parsing.
Ans) There are 2 types of Parsing techniques present parsing, the first one is Top-down
parsing and the second one is Bottom-up parsing. Top-down parsing is a parsing
technique that first looks at the highest level of the parse tree and works down the
parse tree by using the rules of grammar while Bottom-up Parsing is a parsing
technique that first looks at the lowest level of the parse tree and works up the parse
tree by using the rules of grammar.
15) What is symbol table? What is its need in compiler design?
Ans) Definition:
The symbol table is defined as the set of Name and Value pairs.
Symbol Table is an important data structure created and maintained by the compiler
in order to keep track of semantics of variables i.e. it stores information about the scope
and binding information about names, information about instances of various entities
such as variable and function names, classes, objects, etc.
It is built-in lexical and syntax analysis phases.
The information is collected by the analysis phases of the compiler and is
used by the synthesis phases of the compiler to generate code.
It is used by the compiler to achieve compile-time efficiency.
It is used by various phases of the compiler as follows:-
1. Lexical Analysis: Creates new table entries in the table, for example like
entries about tokens.
2. Syntax Analysis: Adds information regarding attribute type, scope,
dimension, line of reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to check for
semantics i.e. to verify that expressions and assignments are semantically
correct(type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much
and what type of run-time is allocated and table helps in adding temporary
variable information.
5. Code Optimization: Uses information present in the symbol table for
machine-dependent optimization.
6. Target Code generation: Generates code by using address information of
identifier present in the table.
Use of Symbol Table-
The symbol tables are typically used in compilers. Basically compiler is a
program which scans the application program (for instance: your C program)
and produces machine code.
During this scan compiler stores the identifiers of that application program in
the symbol table. These identifiers are stored in the form of name, value
address, type.
Here the name represents the name of identifier, value represents the value
stored in an identifier, the address represents memory location of that
identifier and type represents the data type of identifier.
Thus compiler can keep track of all the identifiers with all the necessary
information.
Items stored in Symbol table:
Variable names and constants
Procedure and function names
Literal constants and strings
Compiler generated temporaries
Labels in source languages
Information used by the compiler from Symbol table:
Data type and name
Declaring procedures
Offset in storage
If structure or record then, a pointer to structure table.
For parameters, whether parameter passing by value or by reference
Number and type of arguments passed to function
Base Address
16.Explain S-Attributed & L-Attributed definitions in detail
Ans) Before coming up to S-attributed and L-attributed SDTs, here is a brief
intro to Synthesized or Inherited attributes Types of attributes – Attributes may
be of two types – Synthesized or Inherited.
1. Synthesized attributes – A Synthesized attribute is an attribute of the
non-terminal on the left-hand side of a production. Synthesized attributes
represent information that is being passed up the parse tree. The attribute
can take value only from its children (Variables in the RHS of the
production). For e.g. let’s say A -> BC is a production of a grammar, and A’s
attribute is dependent on B’s attributes or C’s attributes then it will be
synthesized attribute.
2. Inherited attributes – An attribute of a nonterminal on the right-hand side of
a production is called an inherited attribute. The attribute can take value
either from its parent or from its siblings (variables in the LHS or RHS of the
production). For example, let’s say A -> BC is a production of a grammar
and B’s attribute is dependent on A’s attributes or C’s attributes then it will
be inherited attribute.
Now, let’s discuss about S-attributed and L-attributed SDT.
1. S-attributed SDT :
If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
S-attributed SDTs are evaluated in bottom-up parsing, as the values of the
parent nodes depend upon the values of the child nodes.
Semantic actions are placed in rightmost place of RHS.
2. L-attributed SDT:
If an SDT uses both synthesized attributes and inherited attributes with a
restriction that inherited attribute can inherit values from left siblings only, it
is called as L-attributed SDT.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right
parsing manner.
Semantic actions are placed anywhere in RHS.