You are on page 1of 12
1, Introduction to compilers ‘Compilers & translators ‘Translator A program written in high-level language is called as source code, To convert the source code into machine code, translators are needed. A translator takes a program written in source language as input and converts it into a program in target language as output. It also detects and reports the error during translation, Roles of translator are: + Translating the high-level language program input into an equivalent machine language program. + Providing diagnostic messages wherever the programmer violates specification of the high-level language program. Different type of translators, ‘The different types of translator are as follows: ‘Compiler Compiler is a translator which is used to convert programs in high-level language to low-level language. It translates the entire program and also reports the errors in source program encountered during the translation. Source Program Target ——+| Compiter |———-» p1rer, | Error Messages Interpreter Interpreter is a translator which is used to convert programs in high-level language to low-level language. Interpreter translates line by line and reports the error once it encountered during the translation process. It directly executes the operations specified in the source program when the input is given by the user. It gives better error diagnostics than a compiler, Source pesaran Interpreter | output Input By Omkar Javadwar Differences between compiler and interpreter SI.No ‘Compiler Interpreter 1 | Performs the translation of a program | Performs statement by _statement as a whole. translation. 2 | Execution is faster. Execution is slower. 3 | Requires more memory as linking is | Memory usage is efficient as no needed for the generated intermediate | intermediate object code is generated. object code. 4 |Debugging is hard as the error | It stops translation when the first error is messages are generated after scanning | met. Hence, debugging is casy. the entire program only. 5 | Programming languages like C, C++ | Programming languages like Python, uses compilers. BASIC, and Ruby uses interpreters, Assembler Assembler is a translator which is used to translate the assembly language code into machine language code. Assembly: Machine Iamguage ———»| Assembler |» language code ‘code © Lexical Analysis The first phase of scanner works as a text scanner. This phase scans the source code as a stream of characters and converts it into meaningful lexemes. Lexical analyzer represents these lexemes in the form of tokens as: © Syntax Analysis The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct. By Omkar Javadwar RL) pee a Fig. Phase of a Compiler Semantic Analysi Semantic analysis checks whether the parse tree constructed follows the mules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output. Intermediate Code Generation After semantic analysis the compiler generates an intermediate code of the source code for the target machine. It represents a program for some abstract machine. It is in between the high-level language and the machine language. This intermediate code should be generated in such a way that it makes it easier to be translated into the target machine code Code Optimization The next phase does code optimization of the intermediate code. Optimization can be assumed as something that removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources (CPU, memory). Code Generation In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language. The code generator ‘translates the intermediate code into a sequence of (generally) relocatable machine code. Sequence of instructions of machine code performs the task as the intermediate code would do. By Omkar Javadwar © Symbol Table It is a date-structure maintained throughout all phases of a compiler. All the ‘dentifier’s names along with their types are stored here. The symbol table makes it easier for the compiler to quickly search the identifier record and retrieve it. The symbol table is also used for scope management. © Cross-compiler: Across compiler is a platform which helps you to generate executable code. Bootstrapping Bootstrapping is a process in which simple language is used to translate more complicated program which in turn may handle for more complicated program. This complicated program can further handle even more complicated program and so on. ‘Writing a compiler for any high level language is a complicated process. It takes a lot of time to write a compiler from scratch. Hence simple language is used to generate target code in some stages. to clearly understand the Bootstrapping technique consider the following scenario. © Bootstrapping is widely used in the compilation development. © Bootstrapping is used to produce a self-hosting compiler. Selfhosting compiles is a type of compiler that can compile its own source code. © Bootstrap compiler is used to compile the compiler and then you can use this compiled compiler to compile everything else as well as future versions of itself. A compiler can be characterized by three languages: 1. Source Language 2, Target Language 3. Implementation Language The T- diagram shows a compiler °C," for Source S, Target T, implemented in I. Follow some steps to produce a new language L. for machine A: 1. Create a compiler SC," for subset. $ of the desired language. L using language "A" and that compiler runs on machine A. By Omkar Javadwar 2. Create a compiler'C,* for language L written in a subset of L. Ee] 3. Compile 'C," using the compiler °C,* to obtain 'C,* C,* is a compiler for language L. which runs on machine A and produces code for machine A. +0$ +804 4104 ‘The process described by the T-diagrams is called bootstrapping. Compiler construction tools The compiler writer can use some specialized tools that help in implementing various phases ofa compiler. These tools assist in the creation of an entire compiler or its parts. They are also known as compiler-compilers, compiler-generators or translator. These tools use specific language or algorithm for specifying and implementing the ‘component of the compiler. 1. Parser Generators Input: Grammatical description of a programming language ‘Output: Syntax analyzers Parser generator takes the grammatical description of a programming language and produces a syntax analyzer. 2. Scanner Generators By Omkar Javadwar Input: Regular expression description of the tokens of a language ‘Output: Lexical analyzers. Scanner generator generates lexical analyzers from a regular expression description of the tokens of a language. 3. Input: Parse tree ‘Output: Intermediate code. Syntax-directed translation engines produce collections of routines that walk a parse tree and ‘generates intermediate code. 4. Automatic Code Generators Input: Intermediate language. ‘Output: Machine language. Code-generator takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for a target machine. Data-flow analysis engine gathers the information, that is, the values transmitted from one part of a program to each of the other parts. Data-flow analysis is a key part of code optimization. 6. Compiler Construction Toolkits The toolkits provide integrated set of routines for various phases of compiler. Compiler construction toolkits provide an integrated set of routines for construction of phascs of compiler. Programming language basics By Omkar Javadwar Environment and States The environment is mapping from names to locations in the store. Since variables refer to locations, we could alternatively define an environment as a mapping from names to variables. he state is a mapping from locations in store to their values. environment state Wikitechfgom names Locations values (variables) Statie Scope and Block Structure ‘The scope rules for C are based on program structure. The scope of a declaration is determined implicitly by where the declaration appears in the program.Programming languages such as C+, Java, and C#, also provide explicit control over scopes through the use of keywords like public, private, and protected.A block is a grouping of declarations and statements. C uses braces { and } to delimit a block, the alternative use of begin and end in some languages rain) Scopes of declarations BLOCK Declaration D “telongs” to block B if & is the most closely nested biock containing 0 Scope of Declaration D is the block containing 0 ‘and all sub-blocks That 1) Blocks in a C+* program don't redeclare D. Explict Access Control Classes and structures introduce a new scope for their members.If p is an object of a class with a field (member) x, then the use of x in px refers to field x in the class definition Through the use of keywords like public, private, and protected, object oriented languages such as C+ or Java provide explicit control over access to member names in a super class. These keywords support encapsulation by restricting access Public - Public names are accessible from outside the classPrivate - Private names include method declarations and definitions associated with that class and any "friend" classes Protected Protected names are accessible to subclasses. By Omkar Javadwar Dynamic Scope The term dynamic scope, however, usually refers to the following policy: a use of a name x refers to the declaration of x in the most recently called procedure with such a declaration. Dynamic scoping of this type appears only in special situations. The two dynamic policies arc:Macro expansion in the C preprocessorMethed resolution in object-oriented programming.Since dynamic scoping is very uncommon in the familiar languages, we consider the following code as our example. Parameter Passing Mechanism Every language has some method for passing parameters to functions and procedures Formal Parameters: The identifier used in a method to stand for the value that is passed into the method by a caller-Actual Parameters: The actual value that is passed into the method by a caller Call by Value - The actual parameter is evaluated (if itis an expression) or copied (if it is a variable) ina formal parameter.Call by Reference - The address of the actual parameter is passed as value of the corresponding formal parameter.Call by Name - The Called object ‘execute as if the actual parameter were substituted literally for the formal parameter. CALL BY VALUE CALL BY REFERENCE ‘CALL BY NAME = a 1 pena i at 36, 45 rat ats) ja a a eng en 2-1 nae A actual int ard = ume, mani vo parameter 1 Aliasing ‘When two names refer to the same location in memory.There is an interesting consequence of call-by-reference parameter passing or its simulation, as in Java, where references to objects are passed by value.tt is possible that two formal parameters can refer to the same location; such variables are said to be aliases of one another. As a result, any two variables, which may appear to take their values from two distinct formal parameters, can become aliases of each other. 8 ae =»: By Omkar Javadwar 5. Intermediate Code Generation, Symbol Table, Error detection and Recovery Intermediate Code Generation During the translation of a source program into the object code for a target machine, a compiler may generate a middle-level language code, which is known as intermediate code or intermediate text. The complexity of this code lies between the source language code and the ‘object code. The intermediate code can be represented in the form of postfix notation, syntax trees, directed acyclic graph (DAG), three-address code, quadruples, and triples. Intermediate Parser Static Intermediate Code Code checker code generator generator Fig: Position of intermediate code generator If the compiler directly translates source code into the machine code without generating intermediate code then a full native compiler is required for each new machine. The intermediate code keeps the analysis portion same for all the compilers that's why it doesn’t need a full compiler for every unique machine. Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It takes input in the form of an annotated syntax tree. Using the intermediate code, the second phase of the compiler synthesis phase is changed according to the target machine. Postfix notation Postfix notation is the useful form of intermediate code if the given language is expressions. Postfix notation is also called as ‘suffix notation’ and ‘reverse polish’. Postfix notation is a lincer representation of a syntax tree. In the postix notation, any expression can be written unambiguously without parentheses. The ordinary (infix) way of writing the sum of x and y is with operator in the middle: x * y. But in the postfix notation, we place the operator at the right end as xy * In postfix notation, the operator follows the operands. Example Production: 1 2, 3 E + El opE2 E = (El) E > id Data Collected by Omkar Javadwar ‘Semantic Rule Program fragment E.code= El.code || E2.code || 0p print op E.code= El.code E.code= id print id Process of Evaluation of Postfix Expressions The postfix notations can easily be evaluated by using a stack, and generally the evaluation process scans the postfix code left to right. 1. If the scan symbol is an operand, then it is pushed onto the stack, and the scanning is continued. 2. If the scan symbol is a binary operator, then the two topmost operands are popped from the stack. The operator is applied to these operands, and the result is pushed back to the stack. 3. If the scan symbol is a unary operator, it is applied to the top of the stack and the result is pushed back onto the stack. Parser trees and Syntax trees ‘When you create a parse tree then it contains more details than actually needed. So, itis very difficult to compiler to parse the parse tree. Take the following parse tree as an example: T Tox -§ e id id ia © In the parse tree, most of the leaf nodes are single child to their parent nodes. © In the syntax tree, we can eliminate this extra information. © Syntax tree is a variant of parse tree. In the syntax tree, interior nodes are operators and leaves are operands. © Syntax tree is usually used when represent a program in a tree structure, A sentence id + id * id would have the following syntax tree: Data Collected by Omkar Javadwar J~ Jn id Abstract syntax tree can be represented as: L + i™ id id Abstract syntax trees are important data structures in a compiler. It contains the least amount of unnecessary information. Abstract syntax trees are more compact than a parse tree and can be easily used by a compiler. Three address code © Three-address code is an intermediate code. It is used by optimizing compilers. © In three-address code, the given expression is broken down into several separate instructions. These instructions can easily translate into assembly language. © Each Three address code instruction has at most three operands. It is a combination of assignment and a binary operator. Example GivenExpression: 1 *wHEet aD Three-address code is as follows: att, =4ty tis used as registers in the target program. The three address code can be represented in two forms: quadruples and triples. Data Collected by Omkar Javadwar

You might also like