You are on page 1of 30
SAL LURES Ree in} MAES ENS Conesirmcenhs Dalatngcanee CP Prorequitte~ ntroduction of Comalerdetion We basically have two phases of compilers, namely the Analysis phose one Sythesis phase. The analysis phase rents an intermediate representation fem the gvensoutes code, The eythesie phase creates an equivalent target program from the intermediate representation. compiler is a cofrar program that convert the high- Level sourcecode writen ina haraore. The process of converting the source code Inte machine code involves several phases 9 ing onguage ino low-tevel machine code tha ean be executed by the computer stages which ae calectively known asthe phases of compiler. The typial phases of compiler 1. Lena Analysis: Te est phase of a compiler tena analysis also knowa as Scanning, Ts phase rends the sourcecode an breaksit into stream of tokens which are the basic unis of ‘the programming language. The tokens are then passed onto the next pace fee further processing ‘Syntax aly: The second phase of a compare syntax says, also known as parsing. This phase takes the stream of tokens generated bythe lexical analysis phase and checks whether ‘they conform to the gramma of the programming language. The output ofthis phase is usually an Abstract Syntax Tee (ASTI 13. Semanti Analyst: The thed phase of 9 compile semantic analysis. This phase checke ‘ineher the code is semanuiclty correct, Le. whether It conforms to the languages type systern and ather semantic ules. In this stage, the compl chedks the meaning ofthe sourcecode to ensure that t makes sense. The compler performs type checking, which ensues that variables are used covreaty and tht operations ave performed en compete dato types. The compiler ‘go checks for otha semantic errors such as undeclared variables and ncorec unetion cll 4 Intermec Code Generation: The fourth phase of» compilers intrmedite code generation ‘This phace generates anintermecinte representation oft source ode that con be easy translated into machine code. '5. Optimization: The fith phaze of compleris optimization, This phase applies various optimization techniques to the inermeste cade to improve the performance ofthe generted 6.cade Generation: The fra phase of a compliers code generation, This phase takes the optimized intermediate code and generates the actual machine code that can be execited by the wxgetherdare, In summary, the phases of» compiler are: excal analyse syntax analysis, semantic analysis Intermessate code generation, optimization, and code generation. ox ox WideRangeOf ., WideRangeOf =. Courses =" Courses ie Symbol Talo Iti dts structure being used and intined by the compile, consisting of al the identifiers names along with their types, Leis the compiler to funtion smeoty by ing the dentine quickly The analysis of a source programs ded ito mainly three phases. They ae: 1. Linear Anaysie= Ears Da eee" 1. tinea Analyse. This involves scanning phase were the stream of cherdcters is read from Left to righ. sehen roused into various tokens having a ealective meaning 2. Mierareneal Anaiysis- ln this analysis phase, based ona collective meaning, the tokens oe categorized hierarchically 3. Semantic Anaysls- “This phase is use to check whether the components of the soutce program ate meaning oF The compier has tno modules namely the rent end ana the back end Front-end constitutes the Lexical analyzer, semantc analyzer. syntax analyzer, and intermediate cade generator. And the rest reasserted to fom the backend 4. LencatAnatyaer~ Iti alco calles a scanner takoe tho output ofthe proprocessorjuhich performs fileinclsion| {and macre expansion) asthe Input which sina pure high-level language. reads the charactors fom the courcs program and groups them nt lxemee (2oquenes of charactor that"go together. Each texeme corresponds toa token, Tokens are dened by regular expressions vshich ae understood by the lexis annycer. ols rrmaves Lexis errs [eg eroncoue characters), comment, and white space 2. Syntax Analyzer Its sometimes called a parser. econatucts the parse tree. takes al the ‘ekens one by ne and uses Context-Free Grammar to construct the parse tee Why Grammar? The rules of programming can be entaty represented ina few producione Using these productions we can represent what the program actually i The input has tobe checked vihather itn the desired formator nt. “The parse tees also called the derivation ee, Pats ces are generally constuced to check {ec ambiguty in the given grammar. There ar certain rales assecate withthe derivation Wee + Anyidentifr ian expression + Any number can be called an expression ‘+ Performing any operations inthe glven expression wil always resutn an expression. For ‘example, the cum of two expressions i aizo an expression. + The parse tree can be compressed to form a syntax ree ‘Syntax err can be detected a this level the input notin ssordance with the grommnr Parse Tree | ‘Semanticelly verified parse Tree + Semantic Analyzer = Itverfies the parse tre, whether it's mearingtul or nt. furthermore produces vrfet parse tee Italso does type checking, Label checking. and Flow control checking + Intermediate Code Generate ~ lt generotes intermedte code, which s form that con be ready executed by armachine We have many popular intermediate codes, Example Thvae address coves ee Intrmedite code is converted to machine language wsing the last wo phases which ae platform dependent Tuintermesiate cog. tis the same for every compiler out there but afte that. It depends cn the platform. To build new compar we dant need to built from seratch We ean take the intermedate code trom the already existing compiter ans bul the ast two prs + Code Optimizer tt transforme te code to that consumes fewer exourees and produces more speed. The meaning ofthe code beng transformed snot altered. Optimization can be categeizedinte two types: machine-dependent and machine independent. + Target Code Generator The main purpose ofthe Target Code generators 10 white acode that the machine con understand an also reser allceton, instruction selection, et The output ‘denencant onthe typeof assembler. Ths ie the inal stage of compilation. The optimize code ic convertedinta relacatable machine code which PAL URC Re~a in} letermectate code from the alveadyexsting compiler and buld the ast two parts. + Code Optimizer transforms the cade so that it consumes fever resources andl produces more seed. The meaning of the code being transformed snot altered, Optimization can be catogerisedinte two types: machine dependent and machine ndopencent ‘the machine can understand and oso restr allecation, instruction selection, ete The output ls ‘dependent on te typeof assembler, Tiss the inal stage of compilation. The optimized code is converteinto relocatable machine ede which then frms the input to the linker and oad. All these six phases are associated with the symbol table manager anc! error handler as shown in the above Block clogram, 4 Portabily: Compiles alow programs to be written n a high-level programming languoge, ‘nhich canbe execute on diferent hardware ptfrms without the need for macifiton. This ‘means thet programs canbe wittn once and un on multiseplatforms, making thetn more portable 2. Optimization: Compilers can apply various optinizaton techniques to the code, such as loop ‘nrolling, ed code elimination, andl constant propagstion, which can significantly improve the performenee af the generated machine code 2.Etror Checking: Compilers perform a thorough check ofthe routes code, which can detect syntax and semantic erors a compile-time, thereby reducing the kethood of runtime errs. 4 Maintsinabilty: Programs wien in high-level languages are easier to undestanel anc maintain non programs writen in tow-Level assembly Longuage, Compilers help in anslating hightevet ‘code inte machine code, making programs easier to maintain and macy rosutvty of eevetopers Developers ean write code fartr in high-level rguages, hich con In summary, compiers provide advantages suchas portability, optimization, ear checking, raintsinably, and procutvty *GeeksforGeeks helped me ace the GATE exam! Whenever had any doubt regarding any top GF stvayshelpad me and made my concepts quit clear - Anshike Mod [AIR 24 Choose GeekeforGecks as your perfect GATE 2025 Preparation partner with these newly lunehed programs GATE Offine athiNcA (Over 126,000 studems eteady tus s fo be their GATE Exom guide. Join them & let us help you In opening the GATE to top-teeh Ts & NIT! wer Suggest improvement < Previous Next > Compiler construction tools symbol Talein Compiler Similar Reads Working Comper Peees thane 12:14 AM | 47.5KB/s Zi fi SMe) ioliet ss Nie) Introduction of Lexical Analysis Losica Analysis is the fst phase ofthe compiler aiso known 263 ecanner It eonverss the High level input program into a sequence of Tokens. 2. Lexical Analyse can be implemented with the Determiicte Site Automata 2. Tae outputs a sequence of tokens thats set tothe parser for syntax analyse oars Tetan (input Lexical ‘Syntax corm Analyzer fs _Analyzer| crates ‘en What is a Token? A tesica token i sequence of characters that canbe tested a 9 untin the grammarf the programing languages. Example of tokens ‘Type token (4 number fel...) sens IF, id return, + Alphabetic tokens (keywords) eynords; Exanples-for, while, 11 Identifier; Crosples-Verlable nase, function nase, ete operators: Examples “+, "0+", "=" ete Separators; Gxamples ste Example of Non-Tokene: + Comments, preprocessor directive, macros, blanks, abs, newline, ee Lexeme The Eequence of characters matched bya pattern to frm the correpanding token oF 2 ‘sequence ef inpu: characters that comprises 2 single tokens called aexeme.09- “at ro. Kebin =, °273, °F ‘Wide Range Of Courses Wide Range Of Courses. @ MaAncamrtear open > How Lexical Analyzer Works? 1. nau eensacessing; This stage volves cleaning up te input text and preparing it for teleat from the input te 2 Tatenzatian This ic the process ef breaking the input tent into a sequence of tokens. Ti usualy done by matching the craracters i theinput text against a Set of patterns oF euler ‘expressions that fine the diferent sper of tokens ‘5. Token casscation In this tog, the lexer determines the typeof eae ten, For example. in 2 programming language, th lene" mightclaesf keywords, dontiiers, operators, and punctuation symbols as separate token types ‘4, Taken validation Inthe stage, the oxer checks that each token i valid according to the ruler of ‘the programming language. For example, it might check tht a variable name is 2 vali idee a (O} 4 Beare) easy OOM How Lexical Analyzer Works? 1. eput eransaceasng: Thi tage involves clearing up te input text and proparing it for loiet analysis. Thismay include removing comments, whitespace, and ether non-essential charactors from the input tot. 2. Taksniztion: This the process of resting the input textntem sequence of tokens. This = usualy dene by matching the cxaracters in theinput text against ost of patterns or regular expressions that deine the dierent pes of tokens 3. Taken dassfcaton In this stage, the leer determines the type ofeach token. For example. in a programing language. te lene right classify keywords, enters, operators. and puuneunton symbols separate token sypes 4. Token validation nthe stage, the lever checks that each taten fs valid according to the rules af ‘he programming languoge. For exemple, t might check that a variable name is» vali ides, ax that an operator has the covect sy ‘5, Quteut generation In this final stage the lexer generates the cutput of the lexical analysis proces, which i typical tof tokens. The lit of toxene can thon be passed to the next stage of compilation or interpretation oner on ote F ©} + (Ser) er + Thelexial analyzer identifies the exor withthe help ofthe autematin machine an the ‘grammar of the given language on which it is based ke C, Cr+, and gives row number and Column numberof he err ‘Suppose we pas a statement tovah teal analyzer sequence Uke t +6 reo generate token vic ‘Where each iter os variable inthe symbol table eferencing all deta For example, consider the program tnt maingd ‘ 11 2 voriables int 9, 6 ) Aone at tye fatto) Nar treturat '88 83 ‘Above ae the valid tokens You an cbsorve that we have omitted comments, A anther example, consider below print statement. 2 4 printf ( "GeeksQuiz " ) ; 1 3 5 12:15 AM|9.8KB/s Zi Hi) - CNM 211 (010(-1=1 KON) [Ef ottese pseergtsnenaie: cetelisnenaies Gpentngiten OBIS. Cann Wehnns pllageanttedpn CPs ane main ‘ int 9 = 10, b = 20, princt(sue is:4d,arby , Jnsner: Total ruber of token: 27. Exercise 2: Count numberof tens: nt mast + Lexical analyzer it read nt and finds ito be valid and accepts a8 token, ‘ints also s taken then again ae anether taken and aly 00 by and Fore tobe a val aneion name ater reading ( poser Total unber of tokens 7: int, mae, int, 1, + We can represenin the fom oF lexomee and tokens a¢ unser (prey = asetcriven 9 WDENTIFER =» TIFER comparison ——anmwenie Advantages 1. Simplifies Patsing:B:caking down the soure code int tokens makes It easier for omputets to understand and work with the code. This helps programs Uke compilers or interpreter to figure out what the codes suposed to do. Ks ike breaking down a big poze into smaller ieces Error Detector: Lexical anys wil detec lexical enters such as misspell keywerds oF Undefined symbole early inthe compilation proces. This alpen improving the overall efilncy Eficioncys Once the source code convert int tokens, subsequent phacos of compilation or pretalon can operate more efficent. Pasing and semantic analysis become faster and smote streamlined whon worn with tolonizd input Disadvantages 2. Limited Content: Lovical analyse operates based on ndvidual tokens and does not condor the overall contest ofthe code, This can sometimes tend to ambiguity or misinterr codes intended meaning especially in languages with complex syntax oF semantics. (Overhead Although lexical analyse ie neceesery forte compilton a interpretation proces. seis an extra ler of overhead Toerizng the sourcecode requies sional computational resources whi nina the overall perarmance ofthe compiler or interpreter {3, Debugging Challenges Lexical trots detected during the analysis phase may not always provide clear insiatons of ti originsin the orginal source cade. Debugging such errors can be challenging especially if they result fom subtle mistakes in the Lexical analysis recess. slow i provious year GATE quoction on Lexical yi, enw neckafooesis roll PAELLA ae~a in} What is Lexical Analysis? leat anahyse the stating pate attNS ‘compl It gather mesic source coco thot is wrttonin ho fom ot eentoneos ro tne language preprocessor The local ‘ona esponstie or roaking tes synixes ilo a sees of tokens, by removing ‘watespace nthe source code Ihe te “anole gets annua oken It generoter ‘onerar tha stream of characteris by ‘ana eek tha lagatokene,ancthon the sa postdte ne srtaxerayer went Video on Lexical Analysis ‘oer Terminologies There are tee tema ‘Token: is azoquenee of rarecters that Iepresntsaurstotinfrmationin the sree Pattern The description vee bythe token it trownese paters exeme:& sequence of ehoractorsin me roures code, or per the matching pate of ‘token Ir inoungslovore, Ri e0 cles The Architecture of Lexical Analyzer ‘code ane produce atten isthe most Impertant toe ata tes analyer The Iecl cater goes trough withthe entre ures coo ore! isonties och foto ane By ‘one: The seannerie expanse oproc tokens when tis equestedby the parser The toxic cnatyercvous tne whitespace an ‘comments his ereaing these tokens Fon ‘ror occu, the anayre corres tse Roles and Responsibility of Lexical Analyzer ‘We tesco anclyer performs the folowing tone tram eure prom ‘eure prego + iainur cnc area eles Join BYU's Learning Progam Weed root Geman eayoureearn Explore other Popular icles ite ereting these tokens. any ‘error occur, the onayze covroites tase ‘rors tn he sour eon ne roe. Roles and Responsibility of Lexical Analyzer ‘he tesco cnatyer peters the folowing * Meine eerair or ed ea Advantages of Lexieal Analysi¢ + exeolene}lshalatne browse to fat Disadvantages of Lexical Analysis. + aihigteant ea rogue rode video on Lexleal Analysis Koop learng end say tunelto got tha latest updates on GATE Exam along wth OATE get cite, GATE 2023, caTe amit Cora, GATE Sjabus for CSE (Computer Scene Engineering), GATE CSENtes, GATE {SE question Fopar one moe Asobepee, + reduction comoter Design teed mie Oe Join BYU's Learning Program Waco \WiTH SOLUTIONS Geman veayoureearn Explore other Popular iticles BERLE e @ tutorialspoint ( Reo Bons seace Neworkng B roms Paes eo eg rm @css @ voce 2 maton crregening Ou oo 0 songoos wot 1G svecer Ene Some EB crey Poser PA Mathematics BP Elish Q comes Rota @ seetsuten 1B Famion Suter What is the role of the lexical analyzer in compiler design? The lexical analysis is the first phase of the compiler where a lexical analyser operate as an interface between the source code and the rest of the phases of a compiler. It reads the input characters of the source program, groups them into lexemes, and produces a sequence of tokens for each lexeme. The tokens are sent to the parser for syntax analysis. If the lexical analyzer is located as a separate pass in the compiler it can need an intermediate file to locate its output, from which the parser would then takes its input. It can eliminate the need for the intermediate file, the lexical analyzer and the syntactic analyser (parser) are often grouped into the same pass where the lexical analyser operates either under the control of the parser or as a subroutine with the parser. The lexical analyzer also interacts with the symbol table while passing tokens to the parser. Whenever a token is discovered, the lexical analyzer returns a representation for that token to the parser. If the token Danaea in} GD tutorialspoint (sensitinish arse. Reto Bons seace Neworkng B roms Paes eo eg rm @css @ voce 2% maton crregening Ou oo 0 songoos wot ses oe Serves ory Peon eh wanes Berseh Q exons R Pocitogy @ soctowser 1B rasion Suter POH GepIeourun aro token to the parser. If the token is a simple construct including parenthesis, comma, or a colon, then it returns an_ integer program. If the token is a more complex items including an identifier or another token with a value, the value is also passed to the parser. Lexical analyzer separates the characters of the source language into groups that logically belong together, called tokens. It includes the token name which is an abstract symbol that define a type of lexical unit and an optional attribute value called token values. Tokens can be identifiers, keywords, constants, operators, and punctuation symbols including commas and parenthesis. A rule — that represent a group of input strings for which the equal token is make as output is called the pattern. Regular expression plays an essential role in specifying patterns. If a keyword is treated as a token, the pattern is only a sequence of characters. For identifiers and various tokens, patterns form a_ difficult structure. The lexical analyzer also handles issues including stripping out the comments and whitespace (tab, newline, blank, and_ other characters that are SAREE e in} exa GD tutorialspoint (sits ssc token is make as output is called the pattern. ort oectre Regular expression plays an essential role in specifying patterns. If a keyword is treated E rooms as a token, the pattern is only a 4 opestng sytem sequence of characters. For identifiers and various tokens, patterns form a _ difficult Newonng ee structure. Geo The lexical analyzer also BB om handles issues including stripping out the comments and S= whitespace (tab, newline, blank, @ v8 and other characters that are ‘BE Python used to separate tokens in the input). The correlating error messages that are generated by © crreseeing Ow the compiler during _ lexical co analyzer with the source —s program. Master Frontena/Backend/Fll Stack > Bult Projects -> Keep Applying to Jobs ‘And why go anywhere else when our DSA to Development: Cong Guide Netps you do thisin 8 single program! Apply now to our DSA to Development Program ond our counselors will onnect with yeu for further guicance& supaart on 2 Sugqestimorouament < Previous Next > Stati and bynamle Seoping Recursive Descent Parser share your ugh ln tects cD Similar Reads Atanas of Miss Comper Ove Sing ss Camper nile: Design CATECSEPresous es Quons wie Ss Saas Diced Dion ACES tutorialspoint ( B owas E roms 4B opestag sytem eo eg rm @css @ voce 2 maton crregening Ou co 0 songoos Sean ses ome P dig h Mstemsix Peraten Q exons @ Paenseey @ soctowser 1B Famion Suter What is Input Buffering in Compiler Design? Lexical Analysis has to access secondary memory each time to identify tokens. It is time- consuming and costly. So, the input strings are stored into a buffer and then scanned by Lexical Analysis. Lexical Analysis scans input string from left to right one character at a time to identify tokens. It uses two pointers to scan tokens - * Begin Pointer (bptr) - It points to the beginning of the string to be read. * Look Ahead Pointer (Iptr) - It moves ahead to search for the end of the token. Example - For statement int a, b; * Both pointers start at the beginning of the string, which is stored in the buffer. | ptr ifele a[.[e fee * Look Ahead Pointer scans buffer until the token is found. BEES tutorialspoint ( B ova secre E roms 4B opestag sytem eo eg rm @css @ voce 2 maton crregening Ou co 0 songoos Sean ses oe Serves ory Peon h Mstemsix Bersih Q exons @ Paenseey @ soctowser 1B Famion Suter TTel]e al.le fr *The character — ("blank space") beyond the token (‘int’) have to be examined before the token ("int") will be determined. [pe ifale af. fe fe + After processing token ("int") both pointers will set to the next token (‘a), & this process will be repeated for the whole program. ie a|l.fo 1 Iptr A buffer can be divided into two halves. If the look Ahead pointer moves towards halfway in First Half, the second half is filled with new characters to be read. If the look Ahead pointer moves towards the right end of the buffer of the second half, the first half will be filled with new characters, and it goes on. bptr ASE S GP tutorialspoint (serstiwnsmemcame |» B owas E roms + opestng sytem } ptr eee L—__} t_—_i z First Half Second Half ees ane Input Buffering @ess Sentinels - Sentinels are used @ m0 to making a check, each time when the forward pointer is eer converted, a check is completed © cProgeing to provide that one half of the Ou buffer has not converted off. If it is completed, then the other ee half should be reloaded. haan Buffer Pairs - A specialized 2 myst buffering technique can By serene decrease the amount of overhead, which is needed to oe process an input character in Brive transferring characters. It includes two buffers, each nae includes N-character size which sss 7 S y is reloaded alternatively. on There are two pointers such as BP eral the lexeme Begin and forward @cemone are supported. Lexeme Begin = points to the starting of the ia current lexeme which is (serie tes discovered. Forward scans 1B resion tudes ahead before a match for a pattern are discovered. Before a lexeme is initiated, lexeme begin is set to the charactet De need GP tutorialspoint (serctinishsmmswceme @) There are two pointers such as the lexeme Begin and forward Bowe seuctwe are supported. Lexeme Begin points to the starting of the current lexeme which _ is Seo discovered. Forward scans 4 opestaggystern ahead before a match for a pattern are discovered. Before a lexeme is initiated, lexeme ee begin is set to the character Gos directly after the lexeme which is only constructed, and forward Newonng fan is set to the character at its oS right end. @ woes Preliminary Scanning - Certain ‘BE Python processes are best performed as characters are moved from the source file to the buffer. For © crreseeing On example, it can delete coe comments. Languages like Gis FORTRAN which ignores blank aa can delete them from the character stream. It can also G sever collapse strings of several blanks into one blank. Pre- ow processing the character Srme stream being subjected to 1 chenssny lexical analysis saves the Povey trouble of moving the look uaa ahead pointer back and forth cr over a string of blanks. Persie ‘gu Ginni AQ Paptoiogy ton rt oem X exttIm @ soctowser 1 reste des Related Articles Ba Aes ‘SPECIFICATION OF TOKENS “There are 3 specifications of tokens sings 2)Langunge S)Regular expression ‘Strings and Languages ¥ An alphahet or character class isa finite set of symbols, ¥Astring over an alphabet i @ nite sequence of symbols drawn from that alphabet yAlanguage is any countable set of strings over some fed alphabet, Im language theory, the terms “sentence™ and word” are often used as synonyms for “string The length ofa sting s, usually written Js, 8 ‘the number of occurrences of symbols in s. For example, Dbanona is a string of length six. The empty string ‘denoted eis the sting oflength zero, ‘operations on strings "The following sting related terms are commonly used 1 A prefis of string ss any string obtained by removing 2ero or more symbols from the end of strings For example, ban is a prefix of banana, 2 A suffix of string 5s any string obtained by removing zero or more syznbols from the beginning of 2. Asubetring of sis obtained by deleting any prefix and any sux from, For example, nan isa substring of banana 4. The proper prefixes, suffices, and substrings of 2 rng s are those prefixes, suffixes, and substrings, respectively ofthat are note or not equal to tel. 5, A subsequence of sis any string formed by deleting 2ero or more not necessarily consecutive postions ots 6. or example, baan isa subsequence of banana, ‘operations on languages: The following are the operations that can be applied to Tanganges: 2.conestenation 5.kKleene closure ‘Positive closure “The following example shows the operations on strings: Let L=(041) and s=(ae) 5. kdenecmnae 15(e0.00- 4. Poste doe :=(0100.-1 Regular Expressions Each regular expression r denotes language Lt), Here are the rules that define the regular ‘expressions over some alphabet £ and the languages that those expressions denote: 4c 16 regular expression, and L(@ Is {that Is, the language whose sole member isthe empty string. Bea Ee as ray Ecc MeCN Regular Expressions ach regular expression r denotes alanguage L(t) Here are the rules that define the regular fexpressions over some alphabet & and the languages that those expressions denote: 1c is regular expression, and L(@ Is (¢), that language whose sole meraber i the ery string. 2.16" i a symbol in 5, then‘ isa regular expression and (a) = (2), that ts, the language with one string, of length one, with in its one position. Suppose fans are regular expresstons denoting the languages L(=) and Lis). Then, a) (9)|(9) is a regular ‘expression denoting the language LU LG) >) (19) Isa regular expression denoting the language UNL). ©) (0) Isa regular expression denoting (Le). .@ (isa regular expression denoting Lin. ‘the unary operator * has highest precedence and is Teh associative ‘concatenation has second highest precedence and Is left associative 6, | has lowest precedence andi left associative. Regular set ‘Alanguage that can be defined by a regular expression Is called a regular set. ftwa regular expressions rand s ‘denote the same regular set, we say they are equivalent and wrlte ‘There are a number of algebrale laws for regular ‘expressions that cane used to manipulate into ‘equivalent forms. For instance, rs = sir is commutative; ](S|0-(|s)[ Is associative, Regular Definitions ‘Giving names to regular expressions is referred to 38 a Regular definition, IC isan alphabet of baste symbols, then a regular deintion isa sequence of definitions of the form ar diate ay ty each asa aistinet name. [Each rie a regular expression over the alphabet £ U y Qaevee dah Example: Identifiers is the set of strings of letters and igs oginming witha eter. Regular definition for this se: letter A] BY an [Za] DL ou |e | digit 01 11 lo 1 eter (letter | agit) * Shorthands ao aPC e a a i google.com/search?g) + ee ee ee eee Ge eee (x T=) fees ecleume cubis enned Toy FSS nace cee Vd Fite eae sls) eines icons Se | So Mie Sucker acmte ces cu k urd Cree iret eu oumn 5 cd Da tog Cuneo, fe A el eh ee OR ea nd initial state of the transition diagram where ee a Recut costa) Nate eed folie TW Eke eco) ee ena eee a) eed eee ae cea Mea ec Cee cu been found PO Me oR ee ec cea aot og eee A Finite automaton (FA) is a simple idealized Pee ieee ge as Doe ee cos Cee ae ca eke pattern defined by the FA occurs in the input. « ee a ee eee) co Soe (rae eae oats eRe ts etc.) See ee Co ere Cone) * Delimiters/punctuators like comma (,), Ee Ot it eo Tokens can be identified using a dictionary Ce ge eae cha certain sequences of characters, or ee aPC PIR ae CEM 211 (010(-1=1 KON a [Ef otese pseergnenaie; outeisnenais Gpentngstem OBIS Cann Wehnns pllageanttedpn CPs What is LEXin Compiler Design? hove understand what is Level Anstysie Lexical Analysis, Iti th fist of compile dein, takes the input ae ercam of charsctere and gives the Kawores, Operates, Constant ana al Character ithas three phases: 4. Tokenieston: It takes the stream of charactars and converts into tokens. 2. Error Messages: It ives eros related to lexical analysis suchas exceeding length unmatched stung. et 3. Euminate Comments: Elminstes ale spaces, blank spaces, new res and ndantatlons Lex Lex fe teo or computer program thst generates Lexical Anayers (converts the steam of character int tokens), The Lex tal tsele = compiler, The Lex compiler takes the input an {tonsforms that input into input patterns. tis comenonly used with YACC{Yet Another Compiler Compile) wos writen by Mike Lesk ae re Schr. Function of Lex 4. nthe fst step the sourcecode hich isin the Lex language having the filename Fle gives as input to the Lex Compiler commonly known as Lex get the output a ex 2. After thatthe output lexyye Wil be sed as input tothe C compiler which gives the outputin the form ef an ‘a.ut le, and finally, the eutout ile cut take the stecam of characte and Generates tokens as outout len.yyie: Te 4s 9 C program, File.: 18 is 9 Lox source program s.out: It is # Lexicol snalyzer larson een CT) ten: Cau BAL eg TeeWnenater openings SONS. Lex File Format ‘Lex program consists of thre parts ands separated by 98% delimters: eclarations * Translation rules suxiliary procedures Declarations: Tho declarations inelude declarations of voribles ‘Transition rules: These rules consist of Pttem and Acton, ‘Auniary procedures: The Auilary section holds auilary functions used the actions For example declaration ramber(0-5] ™ translation 31 {return CF): sodLiary function nt unbersun() Here's a complete reacmap fr you to become a developer: Learn DSA -> Master Frontend/Backend/Fall Stack > Bull Projects > Keep Applying to Jobs ‘And why go anywho cleo when our DSA to Development: Cosing Guide helps you do thiein 8 single program’ Apply now to ou DSA to Develonment Prontam end our counsellors wi connect vith yeu for further guidance & support on Suggestimorowement « Previous Next > Evaluation Order For S00 Vorious Oat structures Used in Compiler Share your thoughts inthe comments Similar Reads MACs LEX © Lex is a program that generates lexical analyzer. It is used with YACC parser generator. The lexical analyzer is a program that transforms an input stream into a sequence of tokens. e It reads the input stream and produces the source code as output through implementing the lexical analyzer in the C program. The function of Lex is as follows: © Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler runs the lex.1 program and produces a C program lex.yy.c. © Finally C compiler runs the lex.yy.c program and produces an object program a.out. © a.out is lexical analyzer that transforms an input stream into a sequence of tokens. Programiext a = rou Sequence of tokens Lex file format A Lex program is separated into three sections by %% delimiters. The formal of Lex source is as follows: areas “6 javatpoint.com/lex Lex file format A Lex program is separated into three sections by %% delimiters. The formal of Lex source is as follows: {aetuone) (res) Ps {use sutrutines) Definitions include declarations of constant, variable and regular definitions. Rules define the statement of form p1 {action1} p2 {action2}....pn {action}. Where pi describes the regular expression and action1 describes the actions what action the lexical analyzer should take when pattern pi matches a lexeme. 253" rasoacs rine User subroutines are auxiliary procedures needed by the actions. The subroutine can be loaded with the lexical analyzer and compiled separately. are Eas OQ % geeksforgeeks.org/int + Qo oS Introduction of Finite Automata Firite Avtomata{Fa) ie the simplest machine to recognize patternsit is ured to characteris = Regular Language, for example: aati Algo ti ued to analyze and rocognize Natural Language Expressions. The frite automat or Fite state machine isan abstract machine that has five elements or tuples. It has a se of stakes and ‘aes for moving fom one stat to anther but it depends upon the applied input symbol Based on the states andthe set of rules the input string can be either accepted or rejected, Basically, is an ges its internal state LSepending on the current inp symbol, Every automaton defines a language ie et of singe abstract model of « digital computer which reads an input sting lowing fgure shows some essential features of genera automation, “LE we] ne ale 2] ovo ‘The vow figure shows the following festures of autemats input 2.Output 4. Stateretation 5. Output elation ‘A Finite Automata consate ofthe ellowing Finite set of states set of Input syabels. Initial state set of Final States Transitéon Function. Foumal specication af mechine is ECE Fle charactezed ito te types 2) Deterministic Finite Automata (OFA) ora consists of 5 tuples (@, 7, ¢, F, 2) a 1) Mes et Tsetnenater openings SENS. 2) Deterministic Finite Automata (OFA): DFA consists of § tuples (@, 7, ¢, F.2D + fot of all state: sot of input syabols. ( Symbols shich machine tokes es input ) Initial state. ( Starting state of 2 machine ) Transition Function, defined as 7: @K? —-> In 3 DFA. fora portclar input charter, the machine goes to one state only. A transition function tind om every state fr every input symbel. Also in DFA aul or) move i nt alowed, OFA cannet cnange state without any input charac. For example, construct» DFA ich accepts Language of al strings ending with Given: 9= fb), a= fay). Fer}. @= (a0. a9) Fist, considera language st of al the possible acceptable sings in order to construct an accurate state rani dagram, = fo, a8, 209, 2008 ‘Above is simple subset of the possibe acceptable strings there can many other stings which ends swith send contlng symbol 30, ba, bbe, bbae, ab, abba, aabe, baa} ‘urngs not accepted ae 136, 23b, ab te ‘State transition tabte for above auxematon amc gb = a % a a Cone important thing te note is, there can he many possible DFAs fr a pattem AFA witha minimum numberof states is generally prefered, 2) Nondeterminetic Finite Automata(NFA: NFA sir te DFA except folowing ations 1. Nel (or move alowed ie, ;tean move forward without reading symbols, 2, Ailty to ranemit to any number of ett fora particular inp However, these above features cont ad any power to NEA Twe compare both i en of power. both ere equivalent (Due to tne shove action estres IFA hae itferent ranstion function, the reste the same as OFA 2: Transition Function Ox QU?) = 240. _As you can seein the tansiton function For any ip ineluding al oe), NFA ean go 10 any tate number of states. Fer example, betew san HFA for tre above preblem 12:21 AM | 23.2KB/s Zi OQ % geeksforgeeks.org/int + [Ef vottese npneergisnenaic: Usteisnenais Gpentngitem UNS. ConpurWehnns pallageandtedpn CPs 7 Ax QU?) 240, ‘As you can seein the transition funtion for any input eluding nl oF), NFA ean goto any state numer of staes For example blow i NFA forte above preter, HO) Tansivon Table fr above Automaton, rand ® wom! a ra (one important thing to ntels la NFA, Hany path for an input string leads to a final stete, then the input string ie accepted For example, in the above NFA, thee are mulilepathe forthe input string"00" Sine one othe pats ead to final state, "00" is accepted by the above NFA, ‘Some important Pints: + Justification: ‘Sines all the tuples in DFA and NFA ate the same except for ane of the tuples, whieh fe Transtion Function (2) 7:Qx7->a In cote of W PraKr-ow Now i you observe youl fina out @ X2-> Qis part af @X?-> 29, Cn the RHS so, Qi tho subet of 2 which incieaae i contained in 2% or Qis a part of 2 nowover the reverse sn true. So mathemsricaly we can conclude that every DFA Is NFA But not vice-versa Yot there i= way te convert an NFA to OFA, so there exists an equlvatent DFA for every NFA 2, There co be multiple final sttos in both OFA ana NFA $3. NFA is more of a theoretical concept 4, OF ined in Lexical Analysiin Compiler. 5. the numberof states inthe NFA ie N then, it DFA can ave maxim 2 camber feat See Quiz on Regular Exprssion and Fite Automata Here's a complete roscmap fr you te become a developer: Learn DSA > Master Frontend/Backend/Full Stack > Bulla Projects -> Keep Applying to Jobs ‘nd why go anywhere ese wh yn our DSA to Development: Coding Guide Nes you do thi ine ingle program! Apply now to our DEA to Deveteoment Program and our counealors wil connect oes 2 Susqactimorowsmant Brae © 6092360 eon come Introduction lo See Mowe ating ‘Steps To Convert Regular Expressions To Finite Automata ‘even rior expression ‘Steps Finay conver ie ans NTS Get the tech career you deserve,

You might also like