0% found this document useful (0 votes)
319 views25 pages

Atcd U 4

Peephole optimization is a compiler optimization technique that analyzes small sections of code called "peepholes" and makes local changes to improve efficiency. It is applied after general optimizations and fine-tunes code by making small, targeted changes like constant folding, strength reduction, dead code elimination, loop unrolling, and instruction combining. The goal is to produce more efficient code that runs faster and uses fewer resources through analyzing code at a local level and optimizing.

Uploaded by

Duggineni Varun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
319 views25 pages

Atcd U 4

Peephole optimization is a compiler optimization technique that analyzes small sections of code called "peepholes" and makes local changes to improve efficiency. It is applied after general optimizations and fine-tunes code by making small, targeted changes like constant folding, strength reduction, dead code elimination, loop unrolling, and instruction combining. The goal is to produce more efficient code that runs faster and uses fewer resources through analyzing code at a local level and optimizing.

Uploaded by

Duggineni Varun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Peephole optimization is a type of compiler optimization technique that

involves analyzing a small section of code, called a "peephole," and making

local changes to improve the efficiency or performance of the code. The

peephole optimization technique is typically applied after more general

optimizations have been performed, and it is intended to fine-tune the code by

making small, targeted changes.

Peephole optimization can involve a number of different techniques, but some

of the most common include:

1. Constant folding: This involves evaluating expressions that contain only

constants at compile time, and replacing the expression with the result of the

evaluation. For example, the expression "2+2" could be replaced with "4".

2. Strength reduction: This involves replacing expensive operations, such as

multiplication or division, with less expensive operations, such as addition or

subtraction. For example, the expression "x*2" could be replaced with "x+x".

3. Dead code elimination: This involves removing code that is never executed,

such as code that follows an unconditional "return" statement.

4. Loop unrolling: This involves duplicating the code inside a loop, in order to

reduce the number of iterations required to execute the loop.


5. Instruction combining: This involves combining two or more instructions into

a single instruction, in order to reduce the number of instructions required to

execute the program.

Overall, the goal of peephole optimization is to make small, targeted changes

to the code that can have a big impact on its efficiency and performance. By

analyzing the code at a local level and making targeted optimizations,

compilers can produce more efficient code that runs faster and uses fewer

resources.

Unit-4

1. Define compiler. Describe the logical phases of a compiler with a neat

sketch, show the output of each phase, using the example of the following

statement

position := initial + rate * 60

Ans)

A compiler is a software program that transforms source code written in a

programming language into machine code that can be executed by a computer.

The logical phases of a compiler include:

1. Lexical Analysis: This phase involves breaking down the source code into a

sequence of tokens, where each token represents a valid unit of the

programming language. For example, in the statement "position := initial + rate

* 60", the tokens would be "position", ":=", "initial", "+", "rate", "*", and "60".
The output of this phase is a stream of tokens.

2. Syntax Analysis: This phase involves checking the tokens produced in the

previous phase against the grammar rules of the programming language to

ensure that they form a valid expression or statement. This phase creates a

parse tree or an abstract syntax tree (AST) representing the statement. For

example, the parse tree for the statement "position := initial + rate * 60" would

look like the following:

```

:=

/ \

position +

/ \

initial *

/\

rate 60

```

3. Semantic Analysis: This phase involves checking the meaning of the

program and whether it conforms to the language specifications. This includes

type checking, name resolution, and the detection of semantic errors. For

example, if the variable "rate" was not previously defined in the program, a

semantic error would be raised.


4. Intermediate Code Generation: This phase involves transforming the AST

produced in the syntax analysis phase into an intermediate representation.

This intermediate code is generally platform-independent and is used to

optimize the code and make it more efficient.

5. Code Optimization: This phase involves optimizing the intermediate code to

improve the performance and efficiency of the final executable code. This

includes dead code elimination, constant folding, and loop unrolling.

6. Code Generation: This is the final phase of the compiler process. In this

phase, the optimized intermediate code is translated into machine code for the

target platform. The output of this phase is an executable file that can be run

on the target machine.

The output of each phase of the compiler process for the statement

"position := initial + rate * 60" would be as follows:

1. Lexical Analysis:

```

Token Stream:

position, :=, initial, +, rate, *, 60

```
2. Syntax Analysis:

```

AST:

:=

/ \

position +

/ \

initial *

/\

rate 60

```

3. Semantic Analysis: No errors.

4. Intermediate Code Generation:

```

Three-address code:

t1 = rate * 60

t2 = initial + t1

position = t2

```
5. Code Optimization: No optimizations.

6. Code Generation:

```

Machine Code:

LOAD rate

MUL 60

STORE t1

LOAD initial

ADD t1

STORE position

```

2) Explain the chief functions of lexical analysis phase.

Ans)The main function of the lexical analysis phase, also known as the scanner

or tokenizer, is to break down the input source code into a sequence of tokens.

Tokens are the smallest units of meaning in a programming language and are

used to represent keywords, identifiers, operators, literals, and other elements

of the language.

Here are some of the chief functions of the lexical analysis phase:
1. Tokenization: The lexical analyzer scans the source code character by

character and groups them into tokens. Each token represents a specific unit

of meaning, such as a keyword, identifier, operator, or literal.

2. Removal of whitespace and comments: The lexical analyzer removes

whitespace characters, such as spaces and tabs, and comments from the

source code before tokenization. This makes the source code more compact

and easier to process.

3. Error handling: The lexical analyzer detects and reports lexical errors, such

as misspelled keywords or unrecognized characters, to the user.

4. Symbol table management: The lexical analyzer maintains a symbol table

that stores information about the identifiers used in the program, such as their

names, types, and memory locations.

5. Code optimization: The lexical analyzer can perform some simple

optimizations, such as combining consecutive tokens that represent

multi-character operators into a single token, to make the subsequent phases

of the compiler more efficient.

Overall, the lexical analysis phase is an important component of the compiler

process that plays a critical role in transforming the input source code into a

form that can be easily processed by the subsequent phases of the compiler.
3. What is the role of transition diagrams in the construction of lexical analyzer?

Ans) Transition diagrams, also known as finite state machines, are commonly

used in the construction of lexical analyzers. A transition diagram is a

graphical representation of a finite state machine that shows how the machine

moves from one state to another in response to input symbols.

The role of transition diagrams in the construction of lexical analyzers is as

follows:

1. Specification of token patterns: The lexical analyzer is designed to recognize

specific patterns of characters in the input stream and convert them into

tokens. Transition diagrams can be used to specify the regular expressions

that define these patterns, such as identifiers, literals, and operators.

2. Construction of finite state machine: The transition diagrams are then used

to construct a finite state machine that implements the regular expressions

specified in the previous step. Each state in the machine corresponds to a

specific pattern that the lexical analyzer can recognize.

3. Generation of code: Once the finite state machine has been constructed, the

next step is to generate the code that implements the lexical analyzer. This

code typically consists of a loop that reads input characters one by one and

uses the finite state machine to recognize the corresponding tokens.


4. Testing and debugging: Finally, the lexical analyzer is tested and debugged

to ensure that it correctly recognizes all the tokens in the input stream and

handles errors and edge cases gracefully. The transition diagrams can be

useful in this process, as they provide a visual representation of the state

machine and help developers identify potential issues or areas for

improvement.

In summary, transition diagrams are an important tool in the construction of

lexical analyzers, as they help developers specify token patterns, construct

finite state machines, generate code, and test and debug the analyzer.

4. How a finite automaton is used to represent tokens and perform lexical analysis with

examples.

Ans) A finite automaton, also known as a finite state machine, is a

mathematical model used to recognize regular languages, such as those

defined by the regular expressions used to specify tokens in programming

languages. In the context of lexical analysis, a finite automaton can be used to

represent the different types of tokens in a program and to recognize these

tokens in the input stream.

Here's an example of how a finite automaton can be used to recognize

identifiers in a programming language:


1. Define the regular expression for identifiers: In most programming

languages, an identifier is a sequence of letters, digits, and underscores that

starts with a letter or underscore. The regular expression for identifiers can be

expressed as [a-zA-Z_][a-zA-Z0-9_]*.

2. Construct the finite automaton: The finite automaton for this regular

expression consists of a series of states, each of which corresponds to a

specific character or group of characters in the input stream. The automaton

starts in an initial state and transitions to a new state for each character it

reads from the input stream.

3. Label the states: In the case of the identifier regular expression, we can label

the states based on the current position in the identifier. For example, the

initial state can be labeled "start", the next state can be labeled

"letter_or_underscore", and subsequent states can be labeled

"letter_or_digit_or_underscore".

4. Recognize tokens: As the automaton reads characters from the input stream,

it moves from one state to another based on the current input and the current

state. When it reaches an accepting state, it recognizes the corresponding

token, in this case an identifier.

In these examples, the finite automaton is used to represent the regular

expression for a specific type of token and to recognize instances of that token
in the input stream. This is an essential part of the lexical analysis phase of a

compiler, which converts the input source code into a sequence of tokens that

can be further processed by the compiler.

5. Differentiate between token, lexeme and pattern with examples

Ans)
6. Explain the recognition of keywords and identifiers with a suitable transition

diagram.

Ans) In lexical analysis, keywords and identifiers are two important types of tokens

that need to be recognized in the source code. Here's a brief explanation of each and a

suitable transition diagram to illustrate their recognition:

1. Keywords: Keywords are reserved words in a programming language that have a

specific meaning and cannot be used as identifiers. Examples of keywords in many

programming languages include "if", "while", "for", "int", "float", and "void". To

recognize keywords, we can use a simple deterministic finite automaton (DFA) with a

separate state for each keyword. Here's an example of a transition diagram for the

keyword "if":

```

i f

--> q0 ---> q1 ---> q2*

```

In this diagram, `q0` is the initial state, and `q2` is the final (accepting) state. When the

DFA receives an input character "i", it transitions to state `q1`. If the next input

character is "f", the DFA transitions to the final state `q2`, which indicates that the

input characters "if" form a keyword token.


2. Identifiers: Identifiers are user-defined names that represent variables, functions, or

other entities in a program. Examples of identifiers include variable names like

"count", function names like "calculate_sum", and class names like "Student". To

recognize identifiers, we can use a more complex DFA that accepts any sequence of

letters, digits, and underscores, as long as the first character is a letter. Here's an

example of a transition diagram for identifiers:

```

letter digit _

--> q0 ---> q1 ---> q2 ---> q3*

```

In this diagram, `q0` is the initial state, and `q3` is the final (accepting) state. When the

DFA receives an input character that is a letter, it transitions to state `q1`. If the next

input character is a digit or an underscore, the DFA transitions to state `q2`. From

state `q2`, the DFA can transition to itself or back to `q1` on subsequent letters, digits,

or underscores. When the DFA encounters a non-letter, non-digit, non-underscore

character, it transitions to the final state `q3`, which indicates that the input characters

form an identifier token.

By using these transition diagrams, we can effectively recognize and categorize

keywords and identifiers in the source code during lexical analysis.

9. Write an algorithm to find LR(0) items and give an example

Ans) Here is an algorithm to find LR(0) items:


Input: A grammar G in augmented form, the start symbol S', and a set of LR(0) items

I.

Output: A set of LR(0) items I'.

1. Add the item S' -> .S to I.

2. For each item A -> alpha . B beta in I and for each production B -> gamma in G, add

the item B -> .gamma to I.

3. Repeat step 2 until no new items can be added to I.

4. Return I'.

Let's apply this algorithm to an example grammar:

```

S' -> S

S -> AaB

A -> a

B -> b

```

First, we add the initial item to I:

```

I0 = { S' -> .S }

```
Then, we apply step 2 to add new items:

```

I0 = { S' -> .S }

I1 = { S' -> S. }

I2 = { S -> .AaB }

I3 = { A -> .a }

I4 = { B -> .b }

```

Now we have all the LR(0) items for this grammar:

```

I0 = { S' -> .S }

I1 = { S' -> S. }

I2 = { S -> .AaB }

I3 = { A -> .a }

I4 = { B -> .b }

I5 = { S -> A.aB }

I6 = { S -> Aa.B }

```

Each item represents a possible configuration of the parser's state as it parses the input.
10. Define LR(k) parser. Draw and explain the model of LR parser.

Ans) An LR(k) parser is a bottom-up parser that uses a look-ahead of k symbols to

parse a string of tokens. The LR(k) parser is more powerful than the LL(k) parser and

can handle a larger class of grammars. The name LR comes from Left-to-right parsing

and Right-most derivation.

The model of an LR parser consists of three main components: a lexer, a parsing table,

and a stack. The lexer reads the input string and produces a sequence of tokens, which

are then used by the parser. The parsing table is a table that contains the transitions of

the parser. The stack is used to keep track of the parser's current state.

Here is a high-level overview of the LR parser model:

1. Start in the initial state, which is usually state 0.

2. Read the next input token from the lexer.

3. Look up the current state and input token in the parsing table.

4. If the table entry is a shift action, push the input token onto the stack and transition

to the state indicated in the table entry.

5. If the table entry is a reduce action, pop the symbols for the right-hand side of the

production from the stack, and push the left-hand side symbol onto the stack.

Transition to the state indicated in the GOTO part of the table entry.

6. If the table entry is an accept action, the input is accepted and parsing is complete.

7. If the table entry is an error action, report an error and halt.


To understand the LR parser model more clearly, let's consider an example LR(1)

parsing table:

Here, the LR(1) parser is using a look-ahead of one symbol. Each row of the table

represents a state of the parser, and each column represents a possible input symbol.

The Action column shows the action to take when the input symbol is encountered in

the current state, and the GOTO column shows the state to transition to after a

reduction.

For example, suppose we have the input string "ab$". The first input symbol is "a",

and the parser starts in state 0. The table entry for state 0 and input symbol "a" is

"shift, s2". This means the parser should shift the input symbol onto the stack and

transition to state 2.
Now the input symbol is "b", and the parser is in state 2. The table entry for state 2 and

input symbol "b" is "shift, s5". This means the parser should shift the input symbol

onto the stack and transition to state 5.

The next input symbol is "$", and the parser is in state 5. The table entry for state 5

and input symbol "$" is "reduce, B -> b".

10. State and explain the rules used to construct the LR(1) items.

Ans) LR(1) items are used in constructing the parsing table for an LR(1) parser. An

LR(1) item is a production rule with a dot (.) inserted at various positions within the

rule, along with a lookahead symbol that indicates what token can follow the rule in the

input stream. The LR(1) items are constructed using the following rules:

1. Start with the augmented grammar and create the initial item, which consists of the

start symbol followed by a dot and the lookahead symbol $.

- Example: S' -> .S, $

2. For each item that has a dot before a nonterminal symbol, create a new item for each

production rule of that nonterminal, with the dot at the beginning of the rule and the

lookahead symbol from the original item.

- Example: S -> .AB, a => A -> .a, a

3. For each item that has a dot before a terminal symbol, create a new item with the dot

moved one position to the right and the same lookahead symbol as the original item.
- Example: A -> a. , b => A -> a., b

4. For each item that has a dot at the end of the rule, create a new item with the same

rule and lookahead symbol as the original item.

- Example: A -> a. , b => no new items created

5. For each item that has a dot before a nonterminal symbol and a lookahead symbol,

merge it with any other items that have the same nonterminal and lookahead symbol.

- Example: A -> .aB, b and A -> .aC, b => A -> .aB, b C -> .c, b

6. Repeat steps 2-5 until no new items can be created.

The resulting set of LR(1) items can be used to construct the LR(1) parsing table. Each

LR(1) item corresponds to a state in the parsing table, and the lookahead symbols

indicate the transitions between the states. The parsing table can be constructed by

computing the closure and goto sets for each LR(1) item, which are used to determine

the actions to take when parsing the input string.


12. Differentiate Top Down parsing and Bottom Up Parsing.

Ans) There are 2 types of Parsing techniques present parsing, the first one is Top-down

parsing and the second one is Bottom-up parsing. Top-down parsing is a parsing

technique that first looks at the highest level of the parse tree and works down the

parse tree by using the rules of grammar while Bottom-up Parsing is a parsing

technique that first looks at the lowest level of the parse tree and works up the parse

tree by using the rules of grammar.


15) What is symbol table? What is its need in compiler design?

Ans) Definition:

The symbol table is defined as the set of Name and Value pairs.

Symbol Table is an important data structure created and maintained by the compiler

in order to keep track of semantics of variables i.e. it stores information about the scope
and binding information about names, information about instances of various entities

such as variable and function names, classes, objects, etc.

 It is built-in lexical and syntax analysis phases.

 The information is collected by the analysis phases of the compiler and is

used by the synthesis phases of the compiler to generate code.

 It is used by the compiler to achieve compile-time efficiency.

 It is used by various phases of the compiler as follows:-

1. Lexical Analysis: Creates new table entries in the table, for example like

entries about tokens.

2. Syntax Analysis: Adds information regarding attribute type, scope,

dimension, line of reference, use, etc in the table.

3. Semantic Analysis: Uses available information in the table to check for

semantics i.e. to verify that expressions and assignments are semantically

correct(type checking) and update it accordingly.

4. Intermediate Code generation: Refers symbol table for knowing how much

and what type of run-time is allocated and table helps in adding temporary

variable information.

5. Code Optimization: Uses information present in the symbol table for

machine-dependent optimization.

6. Target Code generation: Generates code by using address information of

identifier present in the table.

Use of Symbol Table-

The symbol tables are typically used in compilers. Basically compiler is a


program which scans the application program (for instance: your C program)

and produces machine code.

During this scan compiler stores the identifiers of that application program in

the symbol table. These identifiers are stored in the form of name, value

address, type.

Here the name represents the name of identifier, value represents the value

stored in an identifier, the address represents memory location of that

identifier and type represents the data type of identifier.

Thus compiler can keep track of all the identifiers with all the necessary

information.

Items stored in Symbol table:

 Variable names and constants

 Procedure and function names

 Literal constants and strings

 Compiler generated temporaries

 Labels in source languages


Information used by the compiler from Symbol table:

 Data type and name

 Declaring procedures

 Offset in storage

 If structure or record then, a pointer to structure table.

 For parameters, whether parameter passing by value or by reference

 Number and type of arguments passed to function

 Base Address

16.Explain S-Attributed & L-Attributed definitions in detail

Ans) Before coming up to S-attributed and L-attributed SDTs, here is a brief

intro to Synthesized or Inherited attributes Types of attributes – Attributes may

be of two types – Synthesized or Inherited.

1. Synthesized attributes – A Synthesized attribute is an attribute of the

non-terminal on the left-hand side of a production. Synthesized attributes

represent information that is being passed up the parse tree. The attribute

can take value only from its children (Variables in the RHS of the

production). For e.g. let’s say A -> BC is a production of a grammar, and A’s

attribute is dependent on B’s attributes or C’s attributes then it will be

synthesized attribute.

2. Inherited attributes – An attribute of a nonterminal on the right-hand side of

a production is called an inherited attribute. The attribute can take value


either from its parent or from its siblings (variables in the LHS or RHS of the

production). For example, let’s say A -> BC is a production of a grammar

and B’s attribute is dependent on A’s attributes or C’s attributes then it will

be inherited attribute.

Now, let’s discuss about S-attributed and L-attributed SDT.

1. S-attributed SDT :

 If an SDT uses only synthesized attributes, it is called as S-attributed SDT.

 S-attributed SDTs are evaluated in bottom-up parsing, as the values of the

parent nodes depend upon the values of the child nodes.

 Semantic actions are placed in rightmost place of RHS.

2. L-attributed SDT:

 If an SDT uses both synthesized attributes and inherited attributes with a

restriction that inherited attribute can inherit values from left siblings only, it

is called as L-attributed SDT.

 Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right

parsing manner.

 Semantic actions are placed anywhere in RHS.

You might also like