You are on page 1of 238
SYLLABUS Compiler Design - 3170701 Credits ‘Bramination Marks tc “Theory Marks Practical Marks | Total Marks BEB | PA | ESEM | PAM 7 70 30 30 20 150 1. Ove of he Compiler and its Structure - Language processor, Applicaton of anguay rocenor, Denton Structure Working of compiler. the science of bulding compiler, Base derstanding of interpreter and asenblr. Diference between interpreter and compl Compasn ol source code into tare! lanquge. Cousins of complet. Types of compler (Chapter-1) Aerlal Anal: The Roe ofthe Lexical Analyzer. Specication of Tokens. Recognition of Tohens Input Butering elementary scanner design and its implementation (Lex. Applying concep of it Automata or cognition of tokens (Chapter Stax Analy; Understanding Parser and CFG(Context Free Grammars) Let Recursion ond {at Factoring of grammar Top Doen and Botiom up Parsing Algorithms. Operator-Precedence Pansing, LR Parsers, Using Ambiguous Grammars. Parser Generators. Automatic Generation of Parser. Spat Deted Definitions, Consruction of Symtax Trees. Bottom-Up Evaluation of Definitions. syntax directed definitions and translation TABLE OF CONTENTS ‘Chapter- 1 Overview of the Compiler and Ws Strachare (1-1) to (1-28) 1. Language Processor. 41.2 Definition of Compiler. 1.3 Analysis-Synthesis Model 4.4 Phases of Compile 45 The Science of Building Compie 1.5.1 Modeling in Compile Design and implementation. 1.52 Code Optimization.. 1.6 Applications of Language Processor 116.1 Optimization for High Level Programming Languages. 1.6.2 Optimization for Computer Architecture. 116.3 Designing of New Computer Architecture... 11.64 Program Transation... 165 Software Productivity Too... | 1.7 Basic Understanding of interpreter and Assembler... 1.8 Difference between interpreter and Complies. @-1 te 2-82) 2-2 2.2 Specification of Tokens. 2.2.1 Strings and Langvaee, 2.22 Operations on Language 2.23 Regular Set. 2.24 Regular Expressions. 12.3 Recognition of Token... 24 Input Buffering 2.5 Elementary Scanner Design and its implementation (Lex) 25, Structure of UX. 252 LEX Programs. 2.6 Applying Concepts of Fnite Automata for Recognition of Tokens: 2.6. comtucton of NPA om Repr Breson (onto Con nmin 2:7 Design of el Analaer Genero 27. Tatien Baga fo Programming Contes 28 Option Of DFA vss 21 Detain Ft tort OF}. 282 WFAtODFA Contin. 2.9 Shor uestons and Answers 2.40 Multiple Choice Gueton with Answers Chapter 3 Syntax Analyte 3.1 Understanding Parser an {Context ree Grammars). 3. Rolf Pas 3.12 Why Les and Star nara Separated Oo (8-1) to (3-156) 333.1 Construction of Predictive tt) 3.4 Bottom Up Parsing 3.4.1 Shift Reduce Parser 3.5 Operator-Precedence Parser... 7 3.5,1 Operator Precedence Parsing Algorithm 3.5.2 Precedence Functions un 3.6 LR Parsers 3.7 Simple LR Parsing (SLR). 3.8 LR(k) Parser. 3.9 LALR Parser, 3.10 Comparison of LR Parsers. 3.11 Using Ambiguous Grammars. 3.12 Parser Generators, 3,13 Automatic Generation of Parser... 3.14 Short Questions and Answers 3.15 Multiple Choice Questions with Answers. Chapter-4 Syntax Directed Translation 4,1 Introduction. 4.2 Syntax Directed Definitions ($00). 4.3 Construction of Syntax Trees. 4.3.1 Construction of Syntax Tree for Expression. 4.4 Bottom Up Evaluation of S-Atributed Definitions 4.4.1 Synthesized Attributes on the Parser Stac.. 45 L-Attributed Definition, 4.6 Syntax Directed Definitions and Translation Schemes 4.6.1 Guideline for Designing the Translation Scheme... 4.7 Short Questions and Answers. 4.8 Multiple Choice Questions with Answers @- 1) to (4-38) 4-2 4-2 4-18 4-18 4-2 4-22 4-26 4-28 4-33 4-34 4-36 r saa (5-1) to (6-@ ror Ree Chopter-5 a 6.2 Eror Detection and Recovery §2.AdHoc and Systematic Methods s «, “4 3 hort Questions and Answers. (6-1) to (6 - 56) diate Code Generation Chapter 6 Inter 6-2 8-2 6-2 6-3 6-8 6.1 Introduction to intermediate Code 6.1.1 Benefits of Intermediate Code Generation 6.1.2 Properties of Intermediate Languages 6.2 Variants of Syntax Tres. 16.3 Three Address COB nnn 63.1 Merits and Demerits of Quadruple, Triple and Indirect Triples 6.4 Syntax Directed Transation Mechanisms. 665 Types and Declarations.. 6. Translation of Expressions. 67 rays... 6.8 Boolean Expressions. {5.8.1 Numerical Representation. 68.2 Flow of Control Statements 69.1.1 Type Expres {69.2 Speccation of Simple Type Checker. 69.2.1 Tpe Checking of xpresion 6922 Type Checking of statements 6.10 Short Questions and Answers. 7.2 Storage Organization. 7.3 Storage Allocation Strategies 7.3, Static Allocation on 7.3.2 Stack Mlocation sn: 7.33 Heap Allocation rte 7-4 7.3.4 Comparison between Stati, Stack and Heap Allocation. 7.4 Storage Allocation Space... 7.4.4 Activation Record sn en 7.5 Block Structure and Non Block Structure Storage Allocator 7.5.1 Access to Non Local Names, 75.1. Static Scope ot Lexa Scope fe cate 7-12 7.5.4.2 Lescal Scop for Nested Procedure. pee 7-14 7.6 Parameter Passing. 7.7 Heap Management. 7.7.4 Memory Manager. 7.2 Memory Hierarchy 7.73 Localityin Programs... 7.7.4 Fragmentation 7.8 Short Questions and Answer ow. 7.9 Multiple Choice Questions with Answers Chapter-8 — Code Generation and Optimization (8-1) to (8-58) 8.1 Code Generation 8-2 8.2 Issues in the Design of a Code Generator... 8-2 8.3 The Target Languag : sae B25 83.1 Costof the Instruction enn 8-7 8.4 Basic Blocks and Flow Graphs 8-8 8.4.1 Some Terminologies used in Basic Blocks 8-9 £8.42 Algorithm fr Partitioning into Blocks. 8-9 8.43 Flow Graph . ee 8-10 £85 Loops in Flow Graph 3.6 Next Use Information ow ‘364 storage for Temporary Names: 447 The DAG Representation of Basi BIOGAS 187.1 Agorithm for Construction of DAG. £8.72 Aoplations of DAG 73 0AG based Local Optimization {8.8 Machine Dependent Optimization mm £881 Characersies of Pephole Optimization . 89 Simple Code Generator. £8.10 Register Allocation and Assignment. £110. Global Register Allocation. 230.2 Usage Count £10, Register Assignment for Outer Loop... 8.10.4 Graph Coloring for Register Assignment 8.11 More Examples on Code Generation 8.12 Machine independent Optimization 8.13 Few Selected Optimization 8.33.1 Compile Time Evaluation. 8.13.2 Common Su Expression Elimination. . 8.13.3 Variable Propagation, 8.134 Code Movement nm 8.135 Strength Reduction {8.13.6 Dead Code Elimination 8.14 Loop Opti ization. 9.1.1 Instruction Pipelines and Branch Delays. 9.1.2 Pipelined Execution 9.4.3 Multiple instruction issu. 9.2 Code-Scheduling Constraints. 9.2: Data Dependence. 9.2.2 Finding Dependences Among Memory Access 9.2.3 Tradeott between Register Usage and Parallelism 9.24 Phase ordering between Register Allocation and Code Scheduling. 9.25 Control Dependence. 9.2.6 Speculative Execution Support. 19.2.7 Basic Machine Mode! 9.3 Basic - Block Scheduling. 19.3.1 Data Dependence Graph 19.3.2 Ust Scheduling Algorithm. 9.4 Pass Structure of Assembler. 9-4 9-5 ag 9-6 9-7 9-7 9-3 9-9 sm 9-10 mw 9-12 9-13 Notes Overview of the Compiler and its Structure 1 Syllabus Language processor, Applications of language processors, Definition Sructure-Working. of compiler, the science of building compilers, Basic understanding of interpreter and assembler. L Difference between interpreter and compiler. Compilation of source coe into target language, Cousins of compile, Types of compiler. Contents 1.4 Language Processor 1.2 Definition of Compiler May 12, Marks 6 1.9 Analysis-Synthesis Mode! ‘May-12, Winter-19, Marks 6 1.4 Phases of Compiler Winter-12, 13, 17,20, os ‘Summer-14, 16,17, 19, Marks 7 4.5 The Science of Builiing Compilers 1.8 Applications of Language Processors 1.7 Basie Understanding of interpreter and Assembier 1.8 Difference between Interpreter ‘and Compiler ‘5 Winter-17, 20, Summer-19, --» Marks 4 1.9 Compilation of Source Code into Target Language Wintor-3, Marks 8 1.10 Cousins of Compiler Maye12, Winter-18, Marks 4 1.41 Types of Compiir 4.42. Short Questions and Answers 1.13. Multple Choice Questions as — conper D0 ae we Find of progeam that takes one form of program, a moter fom. The inpat progam 2 called, soure Sean progam is called target ONGUABE, t s vr Tow level language ke assembly language or a high “The source language ca" level anguage lke C, C+#, FORTRAN: = ‘Te target language can bea low level language source [= | - ranseor [oars Tanguage Fie Fig, 4.14 Translator ‘a machine language. Types of Translator ‘There are two types of translators compiler and assembler. Base Functions of Translator ‘translator is used to convert one form.of program. to-anather. 1 crate asi epee ae rock 1 way that dhe generated target code should be easy-to understand. —_ ‘3 The translator should preserve the meaning of the source code. 4 The tarslator shoud epost er. that our during compilation to its wes 5, The talon must be done ficiently Definition of Compiler In this section we wil discus two things: “What is compiler? And “Why fo write compter Let ws sit with “Wha is compen?” Compiler a program which takes one language (Source program) as input and translates it into an equivalent another language (target program). During this process of translation if some errors are encountered then compiler displays them as error messages. The basic model of compiler can be represented as follows. Fig. 12 Compiler TECHNCAL PUBLCATIONS uptempo EE __ Serve fn Compr ant Sire ‘The compiler takes source program as higher leel la 4% higher level languages suchas C, PASCAL, FORTRAN and converts it into low level language ora machine level nnguge such so sembly larguiga CHEE compte ent acters are too tests. Translators comer one form of Pree into ate orm. Comper cnc high el lnguaget mace le legate ‘he enter cones sendy langage rogram Wo achine t nuage Factors that Affect the Design of Compiler are - 1. The choice of source language. 2 The machine architecture on which compiler is executed. 3. The amount of memory available. 4. The type of object code required. ‘Major Functions Done by Compiler 1. The compilers translate high level source program to machine program. 2 I raises error messages if any, during the process of comilation, 3. The translation of source language to machine language must be done efficiently 4. While translating, the compiler preserves the meaning of the code. Analysis-Synthesis Model Ene ‘The compilation can be done in two parts : Analysis and synthesis. In analysis part the source program is read and broken down into constituent pieces. The syntax and the ‘meaning of the source string is determined and then an intermediate code is created from the input source program. In synthesis part this intermediate form of the source language is taken and converted into fan equivalent target program. During this process if certain code has to be ‘optimized for efficient execution then the required code is optimized. The analysis and synthesis ‘model is as shown in Fig. 13.1 4 Explain the analysis synthesis model of compilation, List the factors tha aft the design ampli. Also lis major functions done by compiler. KEOEETESNE? DET Conpier Anaiysis} ofS yrnese source rover Fig. 13.4 Analysis and synthesis model TECHNICAL PUBLICATIONS® - an upto hnowtedge gut in two pars: Analysis and -exieal analysis, syntax analysis with the help of intermediate s tnaton Let us discs these phases qs * “Analysis: 1. Les re lel analysis ae ‘s-Ttis the phase of {s aso called scanning. hich the complete source code is scanned ang ‘ixen up into group of strings called token, suaracers having a collective meaning. For example ie sa flows i ‘your source program aon isa sone of + Akt jgument statement in Your sOucE nal = count + rte 10 shen na as pase is tenet Broken UP IMO SEs Of fhes fotows 1 The dei ota 2 The asigament symbol 3. The dri out 4. The plus sign 5: The nie te 6, The multiplication sign 1h constant be 1 The bank cracls whch are used in the programming statement se elinnated during the lexical analysis phase. 2. Syntax Analysis + The syntax analysis is also called parsing. ‘+ Im this phase the tokens generated by the lexical analyser are grouped together to zoos 7 \ + The syntax analysis determines the syntax tree. For the expression fotal= count + rate +10 the tree cn be generated a wl tno Rome DEA aeairen ra = count « ns “10 TEOMUCAL PUBLICATIONS? «on wa Sorolerpesgn 4:8 __ovenow one Compr and Structure In the statement ‘otal = count + mie *10 " first of all rate10 will be considered because in arithmetic expression the multiplication operation should be performed before the addition. And then the addition operation will be considered. For building such type ‘of syntax tree the production rules are to be designed. The rules are usually expressed by context free grammar. For the above statement the production rules are - () Ee identifier (@)E number @ECH+E @ECErE OE® where E stands for an expression, + By rule (1) count and rate are expressions and ‘© by rule(2) 10 is also an expression. ‘© By rule (4) we get rate‘10 as expression. ‘© And finally count +rate*10 is an expression. 3. Semantic Analysis '* Once the syntax is checked in the syntax analyser phase the next phase ie. the semantic analysis determines the meaning of the source string ‘For example meaning of source string means matching of parenthesis in the expression, or matching of if else statements or performing arithmetic operations of the expressions that are type compatible, or checking the scope of operation, Lo > an Fig. 1.42 Semantic analysis TECHNICAL PUBLICATIONS? - an ups for knowledge 1 Cover ofthe Compiler and ts Stns pn TE aii After these phases ——————————————— a Phases an _1-7__Oveniew oft Compan ta Stocture rm thse tvee phase ae poring te tak of anal intermediate code gets ‘generated. Input processing comp 4. termediate Code Ganeration ahaa oa are ee oa und of coe which 6 eany 10 generate and this ce abe e0 The nee cate Ti oe VY fms ch fhe ates cole quadrape, tripe, pst ae «Hoe we wil consider an intermediate code in tree addres code form. This is rates (Syntax anaizer} rae vrly language. Te three addres code consists of instructions cach of which hae atthe most three operands. For example, i {1 int to float (10) a of S., Syntax tree as maextt aN 6 No 6 = comt +2 [Semantic analyzer] total = $5. Code Optimization BANS Paes opuzatonpiuse aap oinprove the interadiatscode, Bo cers 4 This is necessary to have a faster executing code o lass consumption of memary. an 4 ‘+ ‘Thus by optimizing the code the overall running time of the target program can bbe improved. 6. Code Generation ‘+ In code generation phase the target code gets generated. ‘+ The intermediate code instructions are translated into sequence of machine ‘instructions. MOV rate, RI ‘Symbol able ‘MUL #10.0, Rt MOV cout, 2 ADD R2,_ " ‘MOV Ri, total af ample - Show how an input a = bte *€0 get procsed in compiler. Show the culpa tech sng cf complet AUS shaw th Got eis ef pyrbol able: Optzed cose TECHNICAL PUBLICATIONS? - on upstustfr kowiece TECINCAL PLBLCATIONS® wn tina er montage — ono Dann table is maintained. The task of symbol table management a mae phases of compile 2 symbol ante Pes rs wales) ws nthe POH attributes of ‘identifiers are eceaesn tout he storage allocated fori «me aymbal hie alo stores information abou the subroutines owed in he Pen in cam of wouine, to symbol ttle sores de name. of te Prgrne, number of arguments passed 10 it fype of these arguments, the sarod of pang these angumentsay be call By value or cll By reference) and ret type f ny + Basically symbol table is 2 dat identifiers. «The symbol tuble allows us fo find the record for each identifier quickly and to store or retrieve data from that record efficiently. + During compilation the lexical analyzer detects the identifier and makes its entry in the symbol table, However, lexical analyzer can not determine all the atributes of an Wenifer and therefore the attbutes are entered by remaining phases i compiler. + Various phases can use the symbol fable in various ways. For example while doing the semantic analysis and intermediate code generation, we need to Know what type of identifier are. Then during code generation typically information shout how mach storage allocated to identifiers seen. Error detction and handling + In compilation, each phase detects errors. These errors must be reported to error handler whose tsk is to handle the erors so that the compilation can proceed, ‘+ Normally, the errors are reported in the form of message. information about attrib usually is type, i s€oPe, ta structure used to store the information about * Lange mumber of errors can be detected in syntax analysis phase. Such errors ae popularly called as syntax errors, During semantic tic analysis; of aw analysis; type mismatch kind = TEOHUCAL PUBUCATIONS® an of type float +bte%2; COT Solution : See Fig. 1.44 on next page. Cer eRERTG apie Aw apie map an ah. Preto ps ofc cmp Wits «af ea al ety 3. Explain diferent phases of compiler TON ATT ONT Explain analysis phase of source program with example TECHNICAL PUBLICATIONS? - an up-rutfor howled te tost oveniw ofthe COMPO” 4d Siren, Token seam Senantc tee narrate cose ‘optmzod code Machin code Compteroesen __1-)___oniev afte Goat biere The Science of Building Compilers + Compiler is a program that accepts all the source language programs and convert them in a machine languages. The source language can be large or small, it might contain any programming construct, it is the compiler who converts it into machine code. While translating the code it must preserve the meaning of the source language. Modeling in Compiler Design and Implementation + In the compiler design we must design right design model and must choose right algorithm. Both the algorithms and design must be simple and efficient. + Many compilers make use of the most fundamental model ie Finite State machines and regular expressions. These are used in levcal analysis phase for ‘identifying tokens. Then context free languages are also used to describe the syntax of the program. ‘Code Optimization + The desiring feature of any compiler is to produce optimized code. Optimized ‘code means the code that executes efficiently on the machine. For the compiler the optimized code becomes complex and important. * For optimization following are the compiler design objectives that are used - 1. When optimization is done then the meaning of the source program must be preserved. This is called correct optimization. 2. Due to optimization the performance of the program must get improved. 3, Compiler time required to execute the optimized code must be reasonable. 4, The efforts required due to optimization must be manageable. [EES Applications of Language Processors Optimization for High Level Programming Languages ‘Compiler takes the higher level languages as input and produces the target code. The high level languages are less efficient as compared to low level languages. ‘Their target code runs slowly. But low level languages are dificult to understand ‘and are not portable. Optimizing compilers generally improve the performance of the generated code by eliminating the abstractions posed by the high level languages, ‘+ As new languages got introduced the newer features are getting added up. But the ‘most commonly addressed programming language features are - aggregated data TECHNICAL PUBLICATIONS® - an up-tvus for krowide ew ofthe Compler ads Struct i ures, control fos, IOOP® Eee flow of control. a Be ee pee eT os Es «mst mt es ee a ee See overhead. Compiler ane tant optimization tecique INat eSFAt the optetie pamically and thus red ces the overhead an time ed infomation dynam ture = for computer Architect omen advantage of two basic techniques ‘+ The high performance aes take: route eels tion level parallelism. This ce ea nla uictaneetnarereniien. TH ‘ vaallelism is hidden from the programmer. Fhe SERIE Pa way bat palin can be more ef ear nnchy means aang sever] levels of orge with diferent an rele ove 1 the pooner fastest but sale he pes a sa pave emphases ven for making the memory Herchy cet Designing of New Computer Architecture ae wed, To exploit the In moder computer sreral fective arcectres cet cao areca, the compilers are developed in processor design Se thee a on slmelatrs and then usd to evaluate the architectural compte Des features. The RISC, CISC architectures are highly influenced by the compilers. «Various specialized architectures are getting invented. For instance - SIMD (Single Instruction Multiple Data), VLIW (Very Long. Instruction Word) machines, symbolic arrays, multiprocessors with distributed shared memory. The evelopment in these architecture leads the development and improvement in compiler technologies. EEE] Program Translation The compton tchoology inves tnt ofthe progam fom high level © ‘machine level language. Following are some important applications a ‘translation techniques - 2 ig; Bren TEGIICA PLBLCATONS® on want weeps Comper Dean 1:15 __Overvw of 0 Comper and ts Stvture Binary Translation + Compiler are used to translate binary code of one machine to run it on another machine. Thus due to binary translation the one machine code can be run on another machine. Hardware Synthesis © There are hardware description languages such as VHDL, Verilog and #9 on ‘These languages help to write a specification file in which the hardware is described. The hardware synthesis tools translate this description into gates and physical layout. These tools help in obtaining the optimizing circuit. ‘Query Interpreter + The SQL (Structured Query Languages) are used to search database. The SQL Interpreters compile of interpret the queries given at the command prompt Compiled Simulation Simulation is a general technique in which the design is validated. Simulation is very expensive. Instead of writing simulator itis faster to compile the design to produce machine code. Compiled simulation is used to simulate design written by the VHDL or Verilogs. Software Productivity Tools + Programs are the most important elements of any software systems. The error in the program can cause crashing of the entie system. Hence testing is done to locate errors in the program. + Data flow analysis technique is used to locate errors along the execution path. This technique used in testing is mainly derived in compilers. Other important techniques of locating erors are - ‘Type Checking The type checking is an effective tol used fo catch the inconsistencies in the data types. When operation with wrong types is carried out then this technique identifies the Bounds Checking @ Boundary value checking is a technique in which array index boundary can be checked using bounds check technique. This technigue is used to check the overflow ofthe bulfers in the program. TECHNICAL PUBLICATIONS® - an unit fr nowedge - overview of the Compiler ants Sting campir 80" : aoe sxampl management tools. Ther Memory Management ae ad Te tay various tools a “Treoped to help the programmer to find out the memory Cot. Various management ex7O% EE Basic Understanding of Interpreter and Assembler scan Tn miepnters« Kd of wanlaor which produces the result diet i A sp an dts gen NPL ‘+ Rt does not produce the object code rather each time the program ata meen Souce Resutt +The model for interpreter is as program Tec executor shown in Fig, 171. Fig. 1.7.4 Interpreter + Languages such as BASIC, ‘SNOBOL, LISP can be translated using interpreters. JAVA. also uses interpreter. ‘+ The process of interpretation can be carried out in following phases. 1. Lexi analysis 2. Syntax analysis 3. Semantic analysis 4, Direct execution ‘Advantages ‘+ Modification of user program can be easily made and implemented as execution proceeds. ‘+ Type of object that denotes a variable may change dynamically. + Debugging a and is simpli ram used for tear Poet and fining ere i simplified task fora program used Compile is kindof translator {hat converts the hi into the ‘aching language and assembler isa kind of Pree een eererem in Compier Design i veri ofthe Compe and te Structure EY Ditteronce between Interpreter and Compiler 7m ‘he analysis phase of interpreter and compiler is same ie. in both lexical, syntactic and semantic analysis is performed. 1. Demerit ‘The source program gets interpreted every time its to’be executed, and every tine the ! | source program is. ‘Hence | __iterpretaton is Tess efficient than Compiler. 2 The interpreters do not produce object code. 3 Mert: ‘The interpreters can be made portal because they do not produce object code. é Merit: Interpreter are simpler and give ws Improved debugging environment 5. Aninterpreter is kind om translator which Producer the res icy when he Source language sn datas given fo input. _{e souse oso eum Dred i the Examples of interpreter : ‘A UPS Debogger i scl» grphic! ues level ot conti ball in interpreter wish en handle mule TECHNICAL PuBLICATIONS”® Merit: In the proces of complain the program 1s analaed only nee and then the eae Is fenerated. Here compile seen han i ‘The compilers produc object code Demert ‘The compilers has to Be present on the Host aohne el pectiepopan vos ie comple. Demet: ‘The compiler ia complex progracs and it requires large amount of merry. ‘An compile a kind of translator which fakes only source program a input and ‘converts It into object code ‘sou out poo |? Tose boaram Commonly used translators in operating system. What Is th Ans: The assembler S peng oan The a aM ae the wo commonly used tral Y ie sssembler conver's the source program written in assent! ae? s Fee ee rice Compr Design 1-29 ___Oveniow ofthe Compr and te Structure language into equivalent machine code. The compiler converis the source program “written in high level language into the equivalent machine code. 3. What is compiler 7 ‘Ans. : The compiler is a kind of translator which converts the high level language into the machine level language. The examples of high level languages are C, Fortran. C++, Pascal and s0 on. 2.4 What aro the phases of complior 7 ‘Ane. : Various phases of compiler are lexical analysis, syntax analysis or parsing, semantic analysis, intermediate code generation, code generation and code optimization. 5 What are the cousins of compiler 7 ‘Ans. : Cousins of compiler means the context in which the compiler typically operates. ‘Such contexts are basically the programs such as preprocessor, assemblers, loaders and. Link editors 6 What is an interpreter 7 ‘Ans. : An interpreter is a kind of translator which produces the result directly when the source language and data is given to it as input. It does not produce the object code rather each time the program needs execution. 7 What Is the advantage of front end and back end model of compiler 7 ‘ans. : Following are the advantages of using the front end and back end model of the compiler = * By Keeping the same front end and attaching different back ends one can produce a compiler for same source language on different machines. ‘+ By Keeping different front ends and same back end one can compile several different languages on the same machine 8 What aro machine dependant and machine independent phases ? ‘Ans. : The machine dependent phases are code generation and code optimization phases. ‘The machine independent phases are lexical analyzers, syntax analyzers, semanti analyzers. 9 What are the factors affecting number of pas 2 Jn compl ‘Ans. : Various factors affecting the number of passes in compiler are 1, Forward reference 2. Storage limitations 3, Optimization. Q.10 Define the term cross compiler. Ans. : There may be a compiler which run on one machine and produces the target code for another machine. Such a compiler is called cross compiler. 11 What are the phases included in front end of a compiler ? What does the front ‘end produce ? TECHNICAL PUBLICATIONS® - an upthrust or knows a error ro | sje are Texialanal¥is, syntax ang intermediate code. essor ii) Assemblers iv) Loaders and Linke, si) Macro preproc a iter the phases of compl ver are grouped in such a way that some Phases ily act as a back end of the compiler. Th." rogram while the back end is for synthe 100 ‘aus How wi you ‘ana: The phases of compiler re eaten whe te remain pases Thay rea fe me on ming languag nthe purpose of program a St Ca eo ann ns fete compls problems, Multiple Choice Questions ss 4 What is compiler 7 ‘4 Compiler is an editor. “BCompile isa program that converts high level source program into the machine code. Compiler isa program that converts low level source program into the machine code 4 Compile is « general purpose application program. What are the stages of compilation process 7 4) Requirement analysis, design, implementation, testing and maintenance Documentation, coding and testing. €) Testing and quality assurance. Lexical analysis, syntax analysis, intermediate code generation, code ‘eneration and code optimization, ‘The languages such as C,C+4, PASCAL and FORTRAN are referred a8 —— a databases. high level programming languages. © Tow level programming languages, “middle evel programming SE. cs TECHNICAL PUBLIGATIONS® an upatva! ev noniocge as Compl Ces 4:25 ___Ovenew of ne Compter and ts Stctre as as as a7 as as 10 ‘The definition of interpreter is general purpose application program. representation of the system which is implemented from the design. kind of translator that does the conversion line by line as program runs. 2 program used for editing the source code Following languages are translated using the interpreters '@| C, PASCAL, FORTRAN. __b_LISP, SNOBOL, JAVA, | Assembly language Expert system, knowledge based system. Translation of low level language to machine code is done by interpreter d. loader a) compiler © assembler Cross compiler is a compiler ‘that runs on one machine but produces object code for another machine 1b which is written in a language that is different from the source program. € is writen in the same language of source program |) generates the object code for the host machine only. Incremental compiler is ‘4 that runs on one machine but produces object code for another machine 1b which is written in a language that is different from the source program. ] is written in the same language of source program 4 that allows a modified portion ofthe program to be recompile. ‘An Ideal compiler is 4a) that takes less time for compilation which converts the high level source program to machine level language. {which produces the object code which is smaller in size and execute faster. d_ All of the above. Interpreter is. preferred than compiler because 4 it takes less time to execute Bits helpful in inital phases of program development process. ant ai 213 ang ans Lwvervew or me Compiler ana i js faster an ¢ debugging “dit requres less number of resources. ‘an ieterpeotor Is program that * convers the hgh level language into a machine level language by prog the object code 5 protuces the rests dre when the source Inguage and data gen input ‘automates the translation of assembly language into machine language “plies the source program in the memory and prepares for execution. Compiler is_ preferred than interpreter because 2 4) ittakes less time to execute ‘bits helpful in initial phases of program development process. ‘¢ debugging is faster and easier. 4. it requires less number of resources. Compiler is. preferred than interpreter because 4 42 converts the high level language into a machine level language by prosi the object code produces the results directly when the source language and data is given input € automates the translation of assembly language into machine language @' places the source program in the memory and prepares for execution A system program that places an executable program into the memory ‘et Sore ‘executable program into the m 2 assembler Q.46 Syntax directed translation engines a a7 are ass azo Answer Keys for Multiple Choice Questions —__1:27 __ovensow of ne Comper and ts Sirscture a) loaders operating system € compilers 4. compiler construction tots Storage mapping is dono by 4. loader » linker ) compiler 4. operating system ‘The front end and back end model of compiler Is beneficial because 4a) it takes less time to execute the source code. 1b same program can be compiled on different machines it takes less space for execution. programs written in different languages can be compiled by the same compiler. The external references are resolv a) loader © compilers Compilers are generally writen by 2 computer users. _ professional programmers € database administrators project managers Jor) b as « jou) » low a oa) ad ha7ies lanl fai) « a3 > os 4 Qs 2 Qu bd [os © o9 4 ou) ¢ gis > @5 > Quo a1 2 Qn > go0 TECHNICAL PUBLICATIONS? = an uth fr knowl 1-28 Overview oth Compr any Lexical Analysis Syllabus The Role ofthe Lexical Analyzer, Specification of Tokens, Recognition of Tokens, Input Buffering elementary scanner design ans inplementation (Lex), Applying concepts of Fite Autometa or recognition of tokens. Contents 21 The Role of the Lexical Analyzer Wintor-15, 19, 20, May-12, Summer-19, Marks 7 22 Specification of Tokens Nov.-11, May-12, Winter-13, 19, Summer-16, Marks 7 23 Recognition of Tokens 24 Input Butforing Nov.-11, May-12, Winter-13, 14, 16, 18, 19, 20, ‘Summer-14, 18, Marks 7 2.5 Elementary Scanner Design and ts implementation (Lex) Wanter-20, Marks 4 2.6 Applying Concepts of Finite Automata for Recognition of Tokens 27 Design of Lexical Analyzer Generator. ... Summer-16, 18, 19, Marks 7 2.8 Optimization of DFA May-12, Wintor-13, 15,16, 18, 19, 20, ‘Summer-15, 16, 17, 18, 19, 20, Marks 7 29 Short Questions and Answers 2.10 Muttiple Choice Questions lyzer EBII The Rolo of the Lexical Ang set al analyzer reads the in ay analyzer iste frst phase of complet pins ss a pao character at a e > cece il ay ee, ee eens Each token is asin! ci ea open and pncution T lead analyzer in the process of compilation program can use these fas shown below - Demands syntaxtiee[ Restor | TS nn oxen ‘compiler |code ane sing a ‘oters Fig, 24-4 Role of lxlcl analyzer scana the source program to recognize the tokens itis also token identification lexical, analyzer also performs tuation marks. Then ‘The role of ‘As the lexical analyzer called as scanner. Apart from following functions Functions of lexical analyzor 1. It produces stream of tokens. 2. I eliminates blank and comments. +5 Te generates symbol table which stores the information about identifiers, constants ‘encountered in the input, 4. Tt keeps track of line numbers. 5. I eports the error encountered while generating the tokens. ‘The lexical analyzer works in two phases. In first phase it performs scan and in the second phase it does lexical analysis; means it generates the series of tokens. REI Tokens, Patterns, Lexeme: Let us leam some terminologies, which are frequently used when we talk about the activity of lexical analysis, Tokens : It describes the class or category of input string, For example, identifiers, ‘keywords, constants are called tokens. Patterns : Set of rules that describe the token. Lexemes : Sequence of characters in the source program that are matched with the patter of the token. For example, int, i, num, ans, choice. EE TECHNICAL PLOLEATIONS® uta rime SS 23 ee Let us take one example of pr a le of programming statement to clearly understand these Lena Anata if (acb) Here “itt, " y'a" Pe Po" 2 age all lex 70" "7" are all lexemes. And “if” is a keyword, ‘opening parenthesis, “a” is identifier, “<" is an operator and s0 on. : Now to define the identifier pattern could be - 1. Identifier isa collection of letters. 2. Kentifer is a collection of alphanumeric chars and identifier character shouldbe neces eter a ea ca we wt mp en se Pw bt is prpan ompler. A comple sans the souce pram and proces seg of hens therfore lx analy alo called a seamner Por cxanpie SERED correrate appropriate totens for given piece of source cade, int MAX (int a, int b) { i{(a>b) rotum a else rotum b; ? Solution : | Lexeme Token EE tayrond ‘operator identifice ‘operator keyword identifier ‘operator operator keyword minivie jl. 1-78-15 i TECHNICAL PUBLICATIONS” - an ups or knowadge the lexemes that mate yp», ug token and pater, fi) Tdentifer isa collection of alphanumeric chracters. Ii) The first character of identifier must be a letter. 2) Operator 4) Operator can be arithmetic, logical, relational operators. ii) The parenthesis are considered as operators. i) Comma is treated as separation operator. jv) Assignment is denoted by operator. 3) Keyword 1) Keyword are special words to which some meaning is associated with {i)_ int void are keywords for denoting data types. 1 How do. the parser and scanner communicate ? Explain with the block diagram ‘communication between them. Ea 2. Define lexemes, pater and tokens ‘Specification of Tokens Ss To specify tokens regular expressions are used. When a patter is matched by some regular expression then token can be recognized. Let us understand the fundamental ccancepts of language. FERRE strings and Language String is a collection of finite number of alphabets or letters. The strings are synonymously called as words. ‘© The length of a string is denoted by | $ |. ‘© The empty string can be denoted by e The empty set of strings is denoted by ®. ‘Term ing zero oF more tail symbols semoving a a sting nan episod be Hin pee. renee sore leading sym | an Wyn ould Bea 3 A string cba rng Hindustan the 8 | sate of sting FL Gample, for stn 2 and suffix of a given semoring pets ofa A tring obtained by TUNING ET Mor tring Hindustan substring alle ss ing the string ‘ind! can smoving zero of more not necessarily walled sequence of string, For “Fiabe sequence of string rations on Language EEE ov that the language is a collection of strings. There are various bare oe et a ee asa Description As w ‘operations which can be P LIU 2 = (set of strings in LI and strings in 12) LUL2 = feet of strings in L1 followed by set of strings inh ve Oe 10 Jenoles zero oF more concatnations of L. For example, Let L be the set of alphabets such as L= (A, B,C ...Z, a, b, c..2) and D be the set of digits such as D = { 0, 1, 2..9} then by performing various operations as sliscussed abuve new languages can be generated as follows ~ + Ly Disa set of letters and digits. + LDisa set of strings consisting of letters followed by digits. + Lisa set of strings having length of 5 each. + Us a set of strings having all the strings including e, 1 Eis et of sings having all the strings including TECHNICAL PUBLICATIONS? - an yptirus for knowledge Compiler Dasign ar Lexical Anaysis # Lt is a set of strings except e Regular Set ‘The finite set which denotes a regular language and the set which can be described by regular expression is called regular set For example : A set of identifier is a regular set because it can be represented using regular expression, Regular Expressions Regular expressions are mathematical symbolisms which describe the set of strings of specific language. It provides convenient and useful notation for representing tokens. Here are some rules that describe definition of the regular expressions over the input set denoted by 5 1. eis a regular expression that denotes the set containing empty string. 2 If Ri and R2 are regular expressions then R = Rl + R2 (same can also be represented as R = RI/R2 ) is also regular expression which represents union ‘operation. 3. If RI and RQ are regular expressions then R = RLR2 is also a regular expression which represents concatenation operation. 4. IE R1 is a regular expression then R = RI* is also a regular expression which represents kleen closure, A language denoted by regular expressions is said to be a regular set or a regular language. Let us see some examples of regular expressions. Write « Regular Exrsson (RE) for language containing the strings of length over © = (0,1 Solution: RE, =(0+1) (0+1) White a regular expression for language containing srngs which end with “ab” over = {a, bl. White regular expression fora recognizing ident. Solution : For denoting identifier we will consider a set of letters and digits because ‘identifier is a combination of letters or letter and digits but having first character as letter always. Hence RE. can be denoted as, TECHNICAL PUBLICATIONS® - an ups fr knowiedge Ledial Any (0,1, 2, 9) ing. So we can write, suse indicates that there sno ml STB the + is called postive closure Design regular express voy mur of sa soltion: The RE. will be RE = (0+) ‘The set fr this RE willbe ~ L = bean ab,by ‘The (a +b)’ means any combination of a and b even a null string ‘Construct @ regular expression for the language containing. all strincs Tuc any numberof a's and b's except the mul string. Jon forthe language containing all the strings sty ba, bab, abab, ..any combination of a and b} Selon: RE, = (0 +8)* This regula erssion wil give the setof strings of any combination of # ands eee (QESEERER vii ver xpress fra orig enifer Construct the RE. for the language accepting all the strings which are encling, with 00 over © 1. Solution : The RE. has to be formed in which at the end there should be 00. Thi RE. = (Any combination of O's and 1's) 00 RE = (0+1)' 0 Thus the valid strings are 10,0100, 1000... we have all strings ending. with 00 EEE TECHNICAL PUBLICATIONS? «an upthruat for knowledge Compier Design 208 Lexical Anaiysis EEMIREND ote. fr the langage aceping the strings which ae staring wit 1 and ending with 0, over the st 3 = (0,1 Solution : The first symbol in RE. should be 1 and the last symbol should be 0. 80, RE. = 1(0+1)'0 Note thatthe condition is strictly followed by keeping stating and ending symbols correctly In between them there can be any combination of O and 1 including mull ring. ERIN ite reguier expression to dente the language Lover 3", shee Ze land in which every string willbe such that amy number of as followed by any rumber of Bs flowed by any numberof cs Solution : Any numberof a's means a" Similarly any number of b's and any number of ‘cs means b” and c’. So the regular expression is = a° b’ c’. PEEETD virite RE. to denote a language L over :", where ¥ = (a, b) such that the ‘3° character from right end of the string is always a Solution ‘any number of| | a Sand bs t cither aor’ | | either a or b 3 2 " RE. @+by a@+na+b) ‘Thus the valid strings are babb, baaa or abb, baba or aaab and s0 on. ee ea Foe era mE la bh Solution : There should be exactly two b's. any number of a's) b | any number of a's |b | any number of as, Hence RE. = a" baba’ a” indicates ether the string contains any number of as ora mull string. Thus we can derive any string having exactly two b's and any number of as Write a regular diniton for the language of al strings of O's and 1's with ‘an even number of 0's and odd number of 1's TECHNICAL PUBLICATIONS® - an up-tustfor krowindpe seater of 19 10" contains even number of O and odd number of 1, Find the regular expression corresponding {0 given statement, sy. EO ena cm el ot 2 The language ofall strings containing Q's and 1's both are even. en {Rocked anag a ten oo ED saan gre - 00 et our (m2) EE i irr igs wit xn ig GE Solution: (aatab+babb)* ERIE it eg dfn for 1. The language ofl sings that donot end with 0 2 Al strings of digit hat contin no leading O's, Solution) re. = (0+1)*(+11410) (2) re. = ({1-9] [0-9]")" 1 Whats eur esresin, sie alte algebraic properties of regular expresion Recognition of Tokens For a programming langua -eywords constants and Bunge there are various types of tokens such as identifi: token ype and token nal er? ad #0 08. The token is usually represented by a pa! 28. oeeeatiae FEOHMCA.PUBLCATONS®- a wpa os Complar Design ae —______2 Lance Analysis The token type tells us the category of token and token value gives us the Information regarding foken. The token value is also called token attdbute, During lexical analysis process the symbol table is maintained. The token value can be a pointer to symbol table in case of identifier and constants. The lexical analyzer reads the input Program and generates a symbol table for tokens For example : We will consider some encoding of tokens as follows, Token Code Value fi 1 z se 2 3 while 3 = for 4 = identifier 5 Plrtosymbol ‘able constant 6 Pr to symbol table < 7 1 & 7 2 > 7 3 7 4 . 7 5 c 8 1 ) 8 2 + 9 1 3 ° 2 = 10 iS Consider, a program code as iffe] "(ex woe tools have been built x, ee [Lexical analyzer for ‘constructing Program lexical analyzers layo [eae aout using the special ccompiter | Texeciable program) Purpose notations called regular tn 20 expressions. a act | —Steamot stings woken * Basically LEX is a trom soirce ogra unix utility which ‘generates the lexical analyzer. ‘+ A LEX lexer is very much faster in finding the tokens as compared to the handwritten LEX program in C. 2.5. Generation of lexical analyzer using LEX ‘+ LEX scans the source program in order to get the stream of tokens and these tokens are related together so that various programming constructs such as ‘expressions, block statements, procedures, control structures can be realised, ‘+ The LEX specification file can be created using the extension (often pronounced 4s dot L). For example, the specification file can be x ‘+ This x file is then given to LEX compiler to produce lexyy.« + This lexyy.c. is a C program which is actually a lexical analyzer program. The LEX specification file stores the regular expressions for the tokens and the lex.yy.<. file consists of the tabular representation of the transition diagrams constructed for the regular expression. The lexemes can be recognized with the help of this tabular representation of transition diagram. * Finally the compiler compiles this generated lexyy.c and produces an object rogram a.out. When some input stream is given to a.out then sequence of tokens {get generated. The above described scenario can be modelled below. ERI structure of Lex Now the question arises how do we write the specification file? Well, the LEX program consists of three parts - 1, Declaration section 2. Rule section and __—_3. Procedure section. TECHNICAL PUBLICATIONS” - an uptvst for knowledge are 3 sections in above program. The section starting and ending with %, { and %) respectively is a definition section. The section starting with %% is called rule section. This section is closed by *%Y. Within 9% consists of regular expressions and actions. Rule gives the definition cof noun and second rule gives the definition of verb. ‘The third section consists of two functions the main function and the yywrap function. In main function call to yylex routine is given. This function is defined in lexyy.e program. First we will compile our above program (x/) using lex ‘compiler and then the lex compiler will generate a ready C program named lexyy.c This lexyy.e makes use of regular expression and corresponding actions ‘defined in x. Hence our above program x. is called lex specification file. ‘When we compile lexyy.c file using command cc, here cc means compile C. We get an output file named aout. This is a default output file on LINUX platform) On execution of aout we can give the input string. Following commands are used to run the lex program x. This command generates lex yy.c This command compiles lecyy.c (sometimes ‘gcc can be used) This command runs an executable file ‘After entering these commands a blank space for entering input gets available. There ‘we can give some valid input. compl esion ‘Rama eats Noun ‘verb Seeta sings Noun verb. Then press either control + ¢ or control + d to come out of the output. Notations used in Regular Expressions of LEX _ _ Regular expression Meaning, ‘ ‘Matches with zero or more occurrences of preceeding expreson, For example, 1* occurrence of 1 for any number of times. Maiches any single character other than new line character, a ‘A character class which matches any character within the bracket. For example [-z] matches with any alphabet in lover case Group of regular expressions together put into a new regular ‘expression. con oa eae ee a ee ee ieee | pee eer [0 eescietenerecr Ik f | [6] Used as for negation. For example, [verb] means except ver) mate with anything else. \ Used es escape metacharacter. For example, \n is a newline charactor \W prints the # Hterally { ‘To represent the or Le. another alternative. For example, a | b means match with either aor b. Built-in Variables yin Of the type FILE". This point tothe curent ile being parse by the lex I's Standard input file that stores input source program. . yyeut__OF the type FILE". This point tothe locaton where the output ofthe Jexer will 6 ‘iter By deft bo yin td yt pote saad put and oust yytext The text of the matched patter is stored inthis vaslable (chai) Le. wha le" ‘matches or recognizes the token from input token the lexeme stored in ll terminated string called yytext. Thus current token is returned by this variable TECHIGAL PUBLICATIONS? - an wpa fr iowa 8 Lexical Anatyie Compior Design yleng Gives the length ofthe matched patter. The value in yyleng is same as stlend) ypylineno Provides current line number information, ylval This is global variable used to store the value of any foken Built-in Functions yyled) This sa starting point of lex from which scanning of source program starts yywrap) This function i called when end of files encountered. I yywrap retums 0 the sanner countinues scanning if returns I the sean does ot return tokens plein) Th fncton cn be we opus back al itn characters of the oem ing tex -yymore) This function ells lever to attach next tokens to curren token. yyeror)__ For displaying error messages, this function is used LEX Programs LEX is a scanner program which scans the input. We can further analyse the input by counting the number of words, number of characters and total number of lines appearing within it. Here is a simple program using LEX which counts the total number (of words, number of lines and number of characters appearing in the given input. LEX Program wt Int Char Cnt=0,Word Cnt=0,Line Cnt: vf {word} {Word_Cnt+ +;Char Cnt: \n {Char_Cnt-+Line Cut++} (Char Cat ++, -yylongi} TECHNICAL PUBLICATIONS? - an upatrust for mowiedge LEX Program Using Command Line Argument “The command line parameters are the parameters that are appearing on the hl “Tre command line interface is the interface which allows the user to interact with te computer by typing the commands. EK ‘these parameters to the main function in the form of characte In C we can pass ar i ‘As there are three such parameters that are present at the command line inex arge value will be 3. ‘Let us now discuss how to handle the command line parameters with using (P. specification file. TEGHIICAL PUBLICATIONS® an uth or kowindpe 22 ey a ep Program explanation : In above PUBS siven the input to the program vi, that argvl0] ="./a.out” ang parameter. Note 0 dy anon we have opened the file “D.” in read mode by epi’ in the maind Bs a aa out do nt rst 1 declare 7 5 declaration section!) (ee x) rotin vod t call LEX: ee reece above progam s very simple, We have maintained one temp any sn which the input expression i copied. For example = oon as we come across the closing bracket, the ‘closebracket’ counter will {ncremented. The regular expression specifies that opening bracket should occur prio« closing bracket. This whole logich should we work out untill we come across semi If both the openbracket and closebracket counters are same then declare "well fore! ‘nput!” otherwise declare ‘not well formed". ‘Compl Design 2.28 ica Analysis {print¢C\n\t%s is a GREATER THAN {print{(\nw%s ls @ LESS THAN EQUAL TO OPERATOR’ yytext)} ae {print{(\n\v%s is a GREATER THAN EQUAL OPERATOR: yytoxt}} tent {printi(\n\t%s is a EQUAL TO. OPERATOR: yytoxt};} He {printf(\n\t%s is a NOT EQUAL TO OPERATOR‘ yytext):} ver {printi(\n\tis is a STRING" yytoxt);} *% ee cneice er) fener FILE “fle; ‘ile = fopen(argvit)r); (eh. ‘print((Could not open %s\n",argv{t)); ext(0); vin = fe; otsesnastasansssssasssecnnens a | Following is a C program which contains fragment of code. This filo is taken as an input | by the above LEX program and then the output ls produced ‘Name: tost.c TECHNICAL PUBLICATIONS? - an upstut for knowiedge Complor Design 2025 LewcalAnatysis Applying Concepts of Finite Automata for Recognition of Tokens ‘There is a close relationship between a finite automata and the regular expression. We can show this relation int Fig. 2.6.1. Fig. 20.1 Relationship between FA and regular expression ‘The Fig. 2.61 shows that it is convenient to convert the regular expression to NFA with & moves. Le us see the theorem based on this conversion. EEE Construction of a NFA from Regular Expression (Thompson's Construction) To construct NFA from regular expression r the Thompson's construction is used. The input string of r is parsed and following constructs are used to build an equivalent wa cess rae 2. When r = a for J)=(a) the NFA is TECHNICAL PUBLICATIONS®- an uptivst or knomtadge Fig. 264 regular expression r, and (ta) represents NFA Here N(q) represents NFA for Athens = thn NEA can be den 8 Fig. 266 Thus these constructs are useful for creating NEA equivalent to regular expres TECHNICAL PUBLICATIONS® - an upthnt for Knowledge Comper Design 227 Lexical Analysis OE heeat Anata CEENITRNED corti A uialnt to r= a Solution : We will parse the regular expression r and construct NEA. Letr mr, t= 3". We can say 1, = at and ry =b. Let us draw NFA for r= a* as - ©-——-O—-O—-© Fig. 287 (a) The r = a* b can be drawn as Fig. 287 (&) (CSREES 5122 NA forthe regen exrsion r= (a + Bah Solution : As r = (a + b)* ab, we will build NFA for (a + b)*. Fig. 28.8 (a) TECHNICAL PUBLICATIONS® - an upto inonedyo Loria Any at compter osn ww comb) byrab. (Refer Fig: 258 0) Design of Lexical Analyzer Generator Then we wl bald = (* © To design lexical analyzer generator, the pattern of regular expressions are designed first. ‘+ These patterns are for recognizing various tokens form input string © From these patterns it is easy to design a Non-deterministic Finite Automata (NFA). ‘© But the simulation of DFA is easier by program. Hence we convert the NFA drawn from these patterns to DFA. Inout Regula eng expression oom Fig. 27.1 Building of lexical analyzer generator Let us understand the process of pattern matching with the help of some suitable example A LEX program is given below - ‘Auualllary Definitions (none) ‘Translation rules Implement the LEX Program as DFA. Solution : We will implement the given LEX program as DFA in following steps Stop 1: For each regular expression first we will build the Finite Automata (FA), Pattern 1 : torn 2: Fig. 26.8 (b) TECHNICAL PUBLICATIONS® - an up-trst for knowing TECHNICAL PUBLICATIONS® - an up-thrust for knowledge fo] liso ae ‘Stop 4: Now we will convert the combined NFA to DFA. First ofall we will obtain e closure {0} = {0,137} = [0137] newstate. 8((O17], 8) = (8(0.a)u 8(1a)u 8(3,a)U8(7,2)) z@U2U04En7 = (247) = (24,7) new state, 8([0137]. 6) = (8 (0.b)u 5 (Ld) 5(3,8)U5 7b) =v euveus = [8] + newstate __ The pattern announced for { 0137 ] will be none because 0, 1, 3 oF 7 all these states are non-final stats. _,, /A& in above computation two new states ie{ 2,4 , 7] and [ 8 ] are formed, we will input transitions for these states. 8((27]. 9) = (6(2.a)u 8(4,)08(7.0)) e OR 7 ; = [7] + newstate 8) = (8(2b)u 8(40)U8 (7,0) = @usu8 = EB] newest ‘will obtain the input transitions for the new states that are getting Finally the transition table can be - [EREI Transition Diagrams: for Programming Constructs: Wie gular eSpresson for entifir and Keyword and design «trons diagram fori re = leter (letter + digit)* -o=-6* ‘Write a regular expression for defining constants (unsigned munten) Desig the transition graphs for Bem Solution : The constant can be defined by three different ways - Solution i) re = digit” on ‘ta, © digit & other © i) ee = (digit) «(digit)* ~oe fo «im TECHNICAL PUBLICATIONS® - an upthrust for knowledge CCompler Design 2.99 Lexical Anaiysis fi) re = digit” (eight? 24+ | paige")? SS aa a sr ole ora Ete agra forthe. Solution : re = (<|>|< During lexical analysis, the LEX program uses the regular expression and the {generated lexical analyser writes the procedure or a separate function for each state of DFA. The symbols along the edges represent the input parameters for these functions. ‘At the final state of corresponding DFA the corresponding token can be identified iilged bers rd rings Such es 5280, 3937, 6 336E4 or 18948-4 eater ects for cove mentioned strings. Solution : Refer example 273. PEREEEED rete site transition nga for the unsignd numbers Solution: Refer example 273. TECHNICAL PUBLICATIONS® - an upatust for rowiedge Ara conpier Das” te (EEDA oi vernon for signed and wnsigned number tution + oT a digit —0)1|2|3|4|5)617/8]9 sign o1+|-1 sum —> (sign) ?(digi)” ‘Optimization of DFA Css 16, 18, 19, 20 EEE Deterministic Finite Automata (DFA) The finite automata is called deterministic finite automata if there is only one path for a specific input fom caren state to next state. For example, the DFA canbe shown as below. From state Sp for input ‘a’ there is only one path, | sing tS Smulrly from Sp there is only one path for input b going to Sy, The DFA can be represented by the same S-tuples sscrbed in the definition of FSM. Determinant Definition of DFA Geo ‘A deterministic finite automation is a collection of following, things - 1) The finite set of states which can be denoted by Q- 2) The finite set of input symbols ©, 2) The start state qy such that gy € Q. 4) A set of final states F such that Fé Q. 5) The mapping function or transition function denoted by 6 Two parameter [passed to this transition function : One is current state and other is input sy" ‘The transition function retums a state which can be called as next state For ample, 4, = 8 (ay 2) next sl canes ae 87 8) ean from current state gy eth input the Im short, the DFA is a five tuple notation denoted as : A= QZ bah ~Mhename of DEA is/A which in a collection of above described five elements Ei. TECHNICAL PUBLICATIONS® - an upvust for knowledge Compler Design 2-35 ls eal Analysis CEREEER toms avo fr te ry is is i Solution : re. for the strings ending with 10 What are regular expressions. Find the regular expression described by DEA WA, BI, 10, 1), 8, A, (BI), where 8 is detailed in following table re. = (0+1)*10 DFA will be ol al als Boea Please note B is accepting state. Describe the language defined by regular expression Solution : Regular expression : Refer section 2.22. The transition graph for DFA is - Fig. 283 re = [o'1an'y L = (The words that contain any number of zero's but always end with odd number of one’s} NFA to DFA Conversion There are three methods to convert NFA to DFA. 1, Using subset construction method 2. Direct method 3, Using DFA tree method Let us understand how to Convert NFA to DFA using subset construction method, TECHNICAL PUBLICATIONS® an up-trust for Knowledge ‘* ee tn 10 OFA 4. method for converting NFA wi M = (QE8qp.#) 2 NEA with © We hhave to convert this Nj, recto equivalent DFA denoted By re a (Qp,80: 40 Fo) Then obtain ig. 288 then [prep2rP3v-~ Pa] becomes a start sate o cee) = (Preps Pa Pad pee Solution: Let us obtain &- clmure ofeach sate D Now [pi-p2/P3-~-Pa] € 20 oe a oe son 2s own tm tna [Par a] och Pt secs (us Bp (lieP2r ala) = € -closure(8(py-a)Y8(P2-a)Y-~-8(Pnv)) i z > Now we will obtain 8 transition. Let = closure (4) = {99-41-42} cal it as atate A. = Us dlosure8(pi-a) 8(A,0) = €- closure (8(40-41-42),0} ies closure {8(40.0) U 8(41.0) V 5(42,0} © dlosure {qo} where ais input € ©. {40-41-42} ie. state A step 3: The sates obtained [p,P2-Py»~-Pn] € Qn: The sats containing fal inp; is a final state in DEA. B(A,D) = €- closure {5((40-41-42)0} Definition of ¢ closure e - closure {8(qo,1) U 8(q11) U 8(q2.A)} The -closre (p) sa set of al states which are reachable from state pon eee) ‘transitions such that: {av-a2} Call it as state B. 1) €- closure (p) = p where p € Q. 8(A.2) = €- closure {8 ((qo-41-42),} ii) If there exists €- closure (p) = {q) and 8(q,¢) = r then ¢ - closure (p) = {4,1 = €- closure {8(q9,2) U 8(41.2) U 8(q2.2} ETE int cose fore faving NEA we. = closure {2} = {a2} Call it as state C. Tuas we have obtained os Fig. 28.4 3 a0 > A ~ 1 | wan) = OO © losure (qo) = {40-41-42} means self state +e - reachable states eS ; €- closure (411) = {41-42}-means q, is a self state and qp is a state eae transitions on states B and C for © obtained from q with € input. ay Hence 8'(B,0) = € - closure {8(q},42),0} Fig. 288 €- closure (42) = {42} €- closure {8(q1.0) v 814 TECHNICAL PUBLICATIONS® - on upowut or TECHNICAL PUBLICATIONS®- an upto fr Komi e- dosure {4} =6 e- closure {8(q1/92)/1} = e- losure {8(411) ¥ 8(421)} = e- closure {ai} = {q1-42} ie state B itself 56) 50,2) = €- closure {8(91-92)/2} = e- dosure {8(q1/2) ¥ 8(42-)} = e- closure {q2} = {az} iecstate C Hence 8(8,0) = @ ¥en = B 8(B,. = C ‘The partial transition diagram will be in Fig, 297. Now we will obtain transitions for C: B(C,0) = €- dlosure {8(42/0)} = €- closure {0} = @ 8(C) = e- dosure {8(q2/1)} = €- closure {o}=¢ 8(C,2 = €- dlosure {8(42/2)} = 42 Hence the DFA is as shown in Fig 2938. As A = {40/41/42} in which final state q, lies hhence A is final state in B = {q1,42} the state qy lies hence B is also final state in C = {qo}, the state 4, lies hence C is also a final state. (GEEEIERD constrict the NEA using thompson’s notation for the following rx ‘expression and then concert it to DFA a* (c|d)b*£# EE Solution : & dlosure (G6) = lao, an} &- losure (a1) = {aul TECHNICAL PUBLICATIONS® « an up-hmus for knowledge concleee aw _aaee + dosure(q2) = 942/43-45-46, 47) €- closure(q5) = (43) closure (44) = {44/43/45,46,96,47) e+ dlosure (qs) = [4s, 4s, 47) closure (ag) = (ash closure (q7) = (971 elosure (qs) = (48,410/411/4131 €- dosure(qo) = (4s,410/4n1-41s) e+ closure(qio) = (ai0/ 411-413) © closure (qu) = (au) € -closure(@ia) = (an 412/413) ¢ -closure (qi3) = (413) @-closure (434) = (aul ass) closure (415) let, ¢=dlosure(ao) = gia, au) Be state A 8(A,a) = €-closure (5(qo,41),al = €-closure (6 (49/2) 51,2) e-closure (q2} 192-43-45-46/47! call it as BYA,b) = €-closure (8 (40,4108) e-closure(6 (q9,b).U 8(41,)) = e-closure (@ =¢ BA) = 6 BAA) = 6 BAD = 6 8(B, a) = e-closure(6((q2-43-45.46-47)@)) e-elosure(8(q2,a)8(q 3,0) 5(q5,a) UB(a6-a), UKQz,a)) e-closure(9U q4U 900) e-closure(qs) = (q3,44, 45+ 46,47) Call as state C TECHNICAL PUBLICATIONS® - an updnust er knowledge Let gy, Compl Design 2oat Levent Anaiyaie 5B, = 8(B, b) = e-closure(6(q2/43-45-46/47),b)) = e-closure(8(q2,b)U8(q 5,b)U 8(q5,b)U Bq, 6)U8(qz,b)) = e-closure(oV9U 9499) 3 (B,D) = 6 8(B. 6) = e-closure(qs) = (48-410/411-433) Call it as state D 5B0=D 806, d) = e-closure(qs) = (49-410-411-41} Call it as state E 30,d = E 30,9 = 0 8(Ca) = €-closure(8(q3,45, 4-45.47 8) €- losure(6(q 5-8) 8(q4,a) 8(as-a)u Has,a)U8(a7,a)) ‘e-closure(q4) = BIC, b) = e-closure(8((q 3..44,45-45.47)b)) = €-closure(6(q 3,b)U8(q4-b)U5(qs,b)U Has, b)u (a7 = e-closure (8) =e e-dlosure(8(q 3,45, 45,46, 47)°) €-closure( qs) 3Go =D 30,9 = 6 B(C.d) = e-closure(6 (45,44, 45,46,97)4)) = e-closure| qo} 8G a = E BIC. = e-closure(6(q 5,94, 45,96.97)9) 3D = 6 8(D,a) = e-closure(8(q¢-410, 41/413), TECHNICAL PUBLICATIONS® - an upstst for owed TECHNICAL PUBLICATIONS® - an up-hrust for knowledge ‘compler Design 3D.9) = ¢ fp, b) = e-closure(5(@12) State F 50,8) = (@n-an2-ans) Cats 50,8) > F 50,0 = 5(D.d = ¢ 51D, = €-dlosure((ai 30,9 = G6 BE, a) = €-closure(B((Qar 410,419 Ben =F 8E.0 = SE d= 0 8EN = 6 (Fa) = e-closure(6((qi17 412,413)" 8)) B(F,a) = e-closure(9) = @ ‘B(F,b) = €-dlosure(qn)= F B(F<) = 5 d= om) = G6 Ga) = 8(G,b) = 8G. = 8G-4) = 8G. =% 8(G,#) = e-closure (q15) ie. State HT (G8) = H )= is ie. State G 13)@)) GREEDY consrnct the NFA for ftlowing Regular Expression using Thompson’ t ‘onstruction. Apply subset construction method to convert into DFA. (a + b)sabb Solution : The NFA using Thompson's construction is as follows = dlosure (40) = (go 41-42, 44/97) t= dlosure (43) = (41/42) -dlosure(q2) = (a2) e-closure(q3) = (41/42/4344, a6) e-dlosure(qy) = (a4) TECHNICAL PUBLICATIONS®- on upto rowndpe Lovley, ——————————— ‘Compiler Design a e-dosure(qg) = (41,42, 44/95/96) e-closure (qa) = (91-42,4697! e-dlosure(qz) = (a7) e-closure(qs) = (98,99) e-closure (qo) = (as) e-dlosure(qio) = (40/4) e-dlosure qu) = (aul ¢-dlosure(qi2) = (x2) Let, €-closure(qo) = (41, 41, 92,4: S(a.a) = €-closure (6(49/41/92/44/47)/9)) = e-dlosure (5 (go/8)¥ (41,2) 842.4) a7) Calli A y8(qs-9) 5147.2) = e-closure (q3U 48) “= €-closure (q 3)U8-closure (4s) = (q1/42-43/44-46)U (48-49) layaz/a3,a4edea4-as) Call ita sate B (Aa = B B/(A,b) = €~ closure (6 (g9,41/42/44°47)-b)) = e~dosure (q5) 41-9244-45-46) Callas state C aa) = C 8'(B.a) = e-closure((8(qs,42/43/44/46/49-99)4)) ~dlosure (q 3) . = (qi-92-43-44-d6h = Call ita state D 5B =D 81B,b) = e-closure (8(q1-42-43/44-96,48/49)-b)) = &-dlosure (45 q 10) — See = TECHNICAL PURLCATONS®- an pat rode 2.45 Laven Anaya complerDewgn 2a haar Anti = €~dlosure(qs}Ue~closure(q i) = lay92/44/48/46410/400) = Call it as state E E e-closure ((5(4 1-42/44/45-4.),a)) = € closure (q 3) D 8(C.b) = e-closure ((8(q1-42-44-45-46)-b)) = e-closure (qs) 8G) = C B(D,a) = e~closure (5(q1-42-43-44-46).a)) ‘e-closure (q3) D e-closure (5(q1-912/43/44-46)b)) closure (qs) 5D») = C B(E,a) = €-closure ((5(q1/42-44-45 46-410, 411)-)) = €-closure (qs) 8a) = D B(E,a) = €—closure ((5(q1/42-44-45/46-410, 111)-b)) e-dlosure (5, a2) = €-closure(qs}e~dlosure(a a} = Call it as state F 3) = F B(F,a) — closure ((8(q 1-42/44-45-46-412).4)) © e-dowure(q3) 3a) = D BEEb) = e~closure ((8(q3-42-44-45-46-412)-b)) e~dlosure (qs) Comper Desion eee ese eememre 2Feeee Solution : e- dlosure (qo) = (90/41 4243) = dlosure (41) = (41-43) e- dlosure (q2) = (92-41 43) = (41/42) 43) e- losure (q3) = (qs) a one 27 vical Anaiysis ‘We will obtain 8 transitions for each state for input a 5'Gqo, a) = €- closure (618% (go.#)a)) = € = eosure (6(¢-closure(qa))) = €- closure (6((q0/41-42/43),a)) = € - closure (q2U 43) = €- closure (q2) U € - closure (q 3) far a2, as} (a3) Bqo. a = (a1, a2, ast Bq, a) = € - closure (8((¢- closure (q),a)) closure (5((41/43)-3)) closure (q2) 81, a) = (a1, 42, as) 8Ka2, a) = €- closure (5(e-closure (q2),a)) © - closure (8((q1/42/43),a)) = € - closure (42/43) = closure (q2) U €- closure (a 3) 8q2, a) = (aie a2, as) B1(qa, a) = €- closure (6(¢-closure (q 3)2)) = € - closure (5(q3-)) = e- closure (6) qs, = > The transition table will be vl e = w [ever] [ined ac [eT e-closure(qs) = (41-42-44) -losure(q2) =42) e-closure(qs) = (43-46-41 42/44.47/ 48) = (41-42-43-44-46-47-48) = aad = (45-46-47-98/ 41/92/44} = (a1-42-44/45/46,47-48) = (q6-47-48-41/42-44) = (41-42/94-46797-48) = (ara) = (ast = (49410412413) (aio! {q1-412- 4x3} (ai2-a13} {ais} tau) = (qp/41/42-44-47-49 } Call it A e-closure (8(@0, 41,42, 44,47, 48) 4) closure (43 U g3) |= e-closure (q3) U ¢-closure(gs) = (ar-d2-43-44-96-47-4510 (49-410-412/413) = (a1-92/49-44- 46-47-4849 410-412 4is) > B state (A,b) = e-closure (8(q0, 41, 42, 44, 47, 48)-b)) = e-dlosure (qs) = (4rd2-G4r45-46-97-48) = Call itas state C BAD) = C a) (Ga) = e-dosue (64s 2.4344 do, 478-497 10°412/459)2)) ~elosure (93 49 410) - Faneda-ds- dor 47-98r Gor 4n0eai2ras8- S44) = Call ita state D. 5@a =D 8 Gb) = e-closure (6(qr, 42.43.4496, 17.98" 19" closure (q5¥ 411) ida, Mer 45-46 4748/1 AI2" -qu0/412/413)-®)) aul vE 50m =E 8 (Ca) = e-dosure Ba, 42,94,95.46,47,48)-8) = e-domire(@3 49) aia = 8 8) = e-cosure (4s 42, 94,45, 46,97, 48)) = e-closure (qs) sich = ¢ 5 (O,a) = e-closure (a1 ,42, 43, 44,46, 97,4799 410-432" 419-436) 3) = €-closre 3a Uae) sma=d 8'(D_b) = e-closure (5(4s,42, 43,44, 46, 97,98/497410/ 4327 419-414) *)) e- closure (qsU ait) 8D) = E 8'(Ea) = e-closure(8(q1 42, 4 . {Sepa tas az, a a as aaa ae aiod 8a) = D BED) =e (Eb) closure (8(41 42, 44,95,46,97, 48-411-412/413)-b)) = e-closure (qs) = C 5b) = C ———— TECINCAL PUBLCATONS® an uptake ee compar Design The transition table is a oa = . o> aA 3 © 3 3 = ¢ 2 ¢ © > € € 2 c 2. Direct Method In this method the NFA with ¢ is built using The ‘s € is built using Thompson's Construction Method. The © {is eliminated and then the NFA without ¢ is converted to DPA. Let us understand this ‘method with the help of example Construct deterministic finite automaton to accept the regular expression (0+1)* (00+11)(0+1)" Solution: The FA for re.=(0+1)* (00+11)(0+1)" is as follows - Stop 1: © (ortyyooetsNor4y © Stop 2: © (ony, © (costs) © ov.) TECHNICAL PUBLICATIONS® «an upstrst for knowtodge emer Design 2-89 Lac Anata (40-42-46) 0) = [40-41-41] 8([4042-41] 1) = [40-42-40] Se ae sao ia | el oad fap aid pe leer eee tel = % pas eae san 6: pee at ee eet ae | Dee Seca eee (oad ‘9. 4a) Fal (4.43) (aad + [leo ap ad (90 91-49 Fore. Een Step 6: We now design transition table from above drawn transition graph~ Iho. 93- glean + [lovee ad 195-41-4 fe.aa-ad] | Berenice by apy. ‘The minimize DFA will then be - = aay | eal mad Qi) Bo. 4p Sad The {40/41} is assumed to be a new state [49/41] and {q0,q2} is assumed tobe + Besl oal fang neve state [40,42] Hence we will find input transitions on these states. First we sew ees + [log ad 0 9.09 ahd 8f40-41]-9) = {a0,41,44}=[40-41-4¢] new state 8([90-41]-1) = {40-42} [40-42 ]->new state. Now we will apply input 0 and 1 transitions on the newly formed state a0-a1-a]-0) = [qo-avar] 8(g0-a1-a6] 1) = [40-41-41 ]>mew state 5((40-42],0) = [40,41] 5([a0921-1) = [40,424] Ii we assume A-=[40]. 8 = [50-41] -C= [40-42] D =[40-41-4¢] then the DFA will be = TECHNICAL PURLCATONS? an tat kr noma TECHNICAL PUBLICATIONS? an wit fr irons 2-56 Lone hay “a FA from given regular expression 10 +(0+11)0"1 ve will construct the transition diagram for giVeN regular expression Ccompier Des” Solution : First Now we have got NFA without &. Now we will convert it to required DFA f that, we will first write a transition this NFA. = Compler Design 256 cal Anais ‘The equivalent DFA will be 1 fas) fay. 03) fa ° ° fas) “t93] ten fad faa} | ° a) ee REIEENID cr DFA fr ct tr rs an Solution : We will construct DFA for re. (010 + 00)* (10)* using direct method. Stop 1: 010+ 7 Requird DFA. Fig. 28.12 Construct DFA accepting the strings of binary digits which are even Eo TECHNICAL PUBLICATIONS® - an ups or knowledge Lenca Ara, a ae Solution : i =i 7 Fig. 28:18 “The strings that end with 0 are the even numbers, “A Tree Method 2 ire Roan expreson ante dry converted 40 DEA by constricting binary ne ‘Let us understand this method with the help of examples. 0 it is of the form (e} Solution: The grammar is said to be an augmented grammar if ‘We will construct a syntax tree T for (PM ie. (a[b)"# and then compute fo ‘functions - 1) nullable 2) firstpos 3) lastpos 4) followpos, by traversing T. aly we wil construct DFA from fllowpos, OO cae ‘tp 2: We will compute nullable and frstpos functions, The res for ths : ae ‘The rules for lastpos are same as the rules of frstpos except that a and b are reversed. ‘The computation of followpos function will be using these rules - Rule 4: If node n represents concatenation with left child a and right child b and x is the position in lastpos (a) then all positions in fistpos (b) are in followpos (x)- Rule 2 : Ifn is a Kleen closure node and x is @ postion in lastpos (a) then all positions in firstpos (n) are in followpos (x). ‘The firstpos and lastpos and followpos for each node are as follows - 0a Ae tava” —— [7 8 I oan he oe aa er aac Ee tes ‘Step 3: Now we will construct DFA. ‘Stop a : The firstpos of root is (1, 2,3). Matk this as set A. Thus A wil be the start State in DFA. Now consider input a from the st (1, 2,3} - the 1 and 3 represents input 8. So followpos (1) U followpos (3) = {1, 2,3} te. A. From set (1, 2 3) - the 2 represents Anput b. So followpos (2) = {1, 2,3) ie. A~ * Dtran [A, a] = A tran [A, b] = A ‘As no new states are getting generated, the DFA will be “oe Fig. 28.14 DFA for alt)" ko DFA. ibe Eo TECHNICAL PUBLICATIONS® - an upitnat ar knowledge

You might also like