10 views

Uploaded by Rahul Sharma

role of lexical anayiser

- Homework Feedback _ Coursera
- FA,NFA,DFA
- Baker-CS341-Packet.pdf
- language implementation
- MELJUN CORTES Automata Lecture Equivalence of Nfas and Dfas Part 1 2
- Finite State Automata
- TOC Questions
- lec2
- Project Hind - CuratorStudentGuide
- Regular Properties
- Compiler Design SYLLABUS
- Chap 2
- FLAT-JAN-2015.pdf
- Dinesh Compiler Design Lab Work-3.docx
- Chapter 3
- CompterScienceSyllabus5-8Semesters.pdf
- filelanguage and automata 3
- 06L1
- two marks
- Analysis-Of-It-Business-Process-Requirements-With-A-Finite-Automaton-Based-Algorithm.pdf

You are on page 1of 7

Groups input characters into tokens An expensive phase of the compiler Ways to construct:

Scanner generator flex Hand written in high level language Hand written in assembly language

Lexical Analysis

CS 5300 - SJAllan

Return a token to syntax analyzer Strips white space Keeps track of line numbers Generates output listing with errors marked Delete comments Expands macros, if the language has them Converts number to internal form

3

Terminology

Token Lexemes Patterns

CS 5300 - SJAllan

CS 5300 - SJAllan

Decomposition of Grammar

Determine what lexical analyzer recognizes vs. what the syntax analyzer recognizes Basic symbols

Delimiters Identifiers Constants

The Input

Sequence of characters

CS 5300 - SJAllan 5 CS 5300 - SJAllan 6

The Output

A series of tokens:

Punctuation ( ) ; , [ ] Operators + - * := Keywords begin end if while try Identifiers SquareRoot String literals press Enter to continue Character literals x Numeric literals

Integer: 123 Floating point: 45.23e+2 Based representation: 0xaa

CS 5300 - SJAllan 7

Free form languages (all modern ones)

White space does not matter. Ignore these:

Tabs, spaces, new lines, carriage returns

Layout is critical

Fortran, label in cols 1-6 COBOL, area A B Lexical analyzer must know about layout to find tokens

CS 5300 - SJAllan 8

Punctuation: Separators

Typically individual special characters such as { } ;

Sometimes double characters: lexical scanner looks for longest token:

(*, /* -- comment openers in various languages

Operators

Like punctuation

No real difference for lexical analyzer Typically single or double special chars

Operators: + - == <= Operations: := =>

And perhaps location for error messages and debugging purposes

And perhaps location

CS 5300 - SJAllan

CS 5300 - SJAllan

10

Keywords

Reserved identifiers

E.g. BEGIN END in Pascal, if in C, catch in C++ Returned as kind of token

With possible location information

Identifiers

Rules differ

Length Allowed characters Separators

Token kind Name of the identifier

CS 5300 - SJAllan

11

CS 5300 - SJAllan

12

String Literals

Text must be stored Actual characters are important

Not like identifiers: must preserve case Table needed

We will use a linked list

Character Literals

Similar issues to string literals Lexical Analyzer returns

Token kind Identity of character

Returns

String constant token Actual string

CS 5300 - SJAllan

13

CS 5300 - SJAllan

14

Numeric Literals

Integer

Return the integer constant token Return the value of the integer constant

Handling Comments

Comments have no effect on program Are eliminated by scanner Error detection issues

E.g. unclosed comments

CS 5300 - SJAllan

15

CS 5300 - SJAllan

16

Case Equivalence

Some languages are case-insensitive

Pascal, Ada

Performance Issues

Speed

Lexical analysis can become bottleneck Minimize processing per character

Skip blanks fast I/O is also an issue (read large blocks)

C, Java

We compile frequently

Compilation time is important

Especially during development

CS 5300 - SJAllan 17 CS 5300 - SJAllan 18

General Approach

Define set of token kinds:

An enumeration type Integers Some tokens carry associated data

Identifier - name of the identifier Constant value of constant

Either: Convert entire file to a file of tokens

Lexical analyzer is separate phase

This approach avoids extra I/O Parser builds tree incrementally, using successive tokens as tree nodes

CS 5300 - SJAllan

19

CS 5300 - SJAllan

20

RE NFA DFA MFA LA

Regular Expressions

Regular expressions (RE) defined by an alphabet (terminal symbols) and three operations:

Alternation RE1 | RE2 Concatenation RE1 RE2 Repetition RE*

Also called Kleenes closure

CS 5300 - SJAllan

21

CS 5300 - SJAllan

22

Single characters Alternation Any character Sequence Concatenation Optional RE a b c d \x [bcd] [b-z] ab|cd . (period) x* y+ abc[d-q] [0-9]+(\.[0-9]*)?

Precedence in REs

Highest to lowest Kleene closure

Left associative

Concatenation

Left associative

Alternation

Left associative

CS 5300 - SJAllan

23

CS 5300 - SJAllan

24

Examples of REs

a* (a|b)* (|a|b)(a|b)(a|b)(a|b)* BEGIN | END | IF | THEN | ELSE letter(letter|digit)* (digit)(digit)* A|B|C||Y|Z 0|1||9

25

Using flex

Flex source program cpsl.l lex.yy.c lexyy.c Flex Compiler lex.yy.c (Unix) lexyy.c (Windows)

C/C++ Compiler

a.out

Input stream

a.out

Sequence of tokens

CS 5300 - SJAllan

CS 5300 - SJAllan

26

{ definitions } %% { rules } %% { programmer subroutines }

Definition

Any combination of:

Definitions name space translation Included code space code Included code %{ code %}

CS 5300 - SJAllan

27

CS 5300 - SJAllan

28

Rules

Any number of rules of the form

Expression { Action } Expression is a regular expression that describes the token (pattern for token) Action is C/C++ code to be executed when the pattern is matched

If it is more than a single statement, it should be enclosed in braces

yytext

Variable where the lexeme is kept. A character string and is reused for every token

yyleng

Length of the string in yytext

yylval

Variable in which the lexeme can be returned

yywrap

Function called when EOF is encountered

CS 5300 - SJAllan

29

CS 5300 - SJAllan

30

%{ #include <string.h> #include "utility.h" #include "pascal.tab.h" %} Letter [a-zA-Z] digit [0-9] lord [a-zA-Z0-9] %% BEGIN {return(BEGINSY);} END {return(ENDSY);} WHILE {return(WHILESY);} ... {letter}({lord})* {yylval.name_ptr = strdup(yytext); return(IDENTSY);} ({digit})+ {yylval.int_val = intnum(); return(CONSTANTSY);} ":=" {return(ASSIGNSY);} ":" {return(COLONSY);} ... . {error("Illegal character");} %% int intnum () /* convert character string into an integer */ { ... }; /* intnum */

Finite Automata

0 1 2 3 4 Example DFA 5 Transition table 3 4 3 3 4 5 3 3 ( 1 2 3 2 2 * ) Other

CS 5300 - SJAllan

31

CS 5300 - SJAllan

32

Formal Definition

A deterministic finite-state automaton, or DFA, is a five-tuple M=(Q,,,q0,F)

1. 2. 3. 4. 5. Q is finite set of states is the alphabet of the machine is the state transition function q0Q is the start state FQ are the final states

A configuration is designated (q,) where q is a state and w is the string remaining

(q0,) initial configuration (q,) final configuration if qF indicates a move

(q,a)(q,) iff a, *, and q(q,a) Language (L(M))for FMS is described as follows: L(M) = {*|(q0,)*(q,) for some qF}

33 CS 5300 - SJAllan 34

CS 5300 - SJAllan

Finite Automata

Consider the two FAs M2 and M3 shown

What is L(M2) and L(M3)? What is important about M3? Why is it important?

Difference between:

NFA arbititrary choices permitted in transitions DFA no choice allowed on any move

M2

Given a terminal symbol, there may be a choice of which state to go to There may be empty moves

Doesnt consume input

M3

CS 5300 - SJAllan 35

CS 5300 - SJAllan

36

For For a For A|B For AB For A*

CS 5300 - SJAllan 37

Construct the NFA for the RE (ab|aba)*

CS 5300 - SJAllan

38

NFA to DFA

Applying Algorithm

States A B C D

CS 5300 - SJAllan 39 CS 5300 - SJAllan

Input a b a a B

New State B C D B C

40

DFA to MFA

F = A, C, D N=B

Initial partitioning of states Final MFA

CS 5300 - SJAllan

41

CS 5300 - SJAllan

42

- Homework Feedback _ CourseraUploaded bySuhas Rohit Pai
- FA,NFA,DFAUploaded bycvinay24
- Baker-CS341-Packet.pdfUploaded bybeter
- language implementationUploaded bylahsivlahsiv684
- MELJUN CORTES Automata Lecture Equivalence of Nfas and Dfas Part 1 2Uploaded byMELJUN CORTES, MBA,MPA
- Finite State AutomataUploaded byKiran Kumar Kuppa
- TOC QuestionsUploaded byjyothimidhuna
- lec2Uploaded bykuku288
- Project Hind - CuratorStudentGuideUploaded byanmol2701
- Regular PropertiesUploaded byLianna Duarte
- Compiler Design SYLLABUSUploaded byksai.mb
- Chap 2Uploaded bySparśa Roychowdhury
- FLAT-JAN-2015.pdfUploaded byChempa Tusti
- Dinesh Compiler Design Lab Work-3.docxUploaded byPanda Thanks
- Chapter 3Uploaded bylikufanele
- CompterScienceSyllabus5-8Semesters.pdfUploaded bybinzbinz
- filelanguage and automata 3Uploaded byashutoshfriends
- 06L1Uploaded bytheresa.painter
- two marksUploaded byJegathambal Ganeshan
- Analysis-Of-It-Business-Process-Requirements-With-A-Finite-Automaton-Based-Algorithm.pdfUploaded byIJSTR Research Publication
- HW1_nlpUploaded byЕвгений Осколков
- TOCAssigmentUploaded byGaurav Dhongde
- IRJET-Hindi language as a graphical user interface to relational database for transport systemUploaded byIRJET Journal
- joins-130207051850-phpapp02Uploaded byShaibal Barua
- 4740.CS3019(CD)Uploaded bykamaldeepkatyal
- asdfasdfUploaded byJose Ribamar Saraiva
- a.txtUploaded byhypolit4347
- upUploaded byneggo0303
- Lecture 5Uploaded byKamal Walia
- chapter 2 basic concepts.pptxUploaded byShivaprasad.b.k

- Theory of ComputationUploaded byVenkataMurali Krishna
- Chapter 05. Nondeterministic AutomataUploaded bykims3515354178
- TOC_Central Concept of Automata TheoryUploaded byDilshad Pt
- Automata Theory(Tutorialspoint.com)Uploaded byNadirBengana
- AutomataUploaded byAbhinav Shrivastava
- AlgodatV1 2WS09a Engll WS1112Uploaded byJonathan Amador
- nfa-dfaUploaded byHana Michel
- 20180723214340D2749_Comp6062-Pert05- 2018Uploaded byDicky Angkasa
- CSE Final Upto 4h Year Syllabus 14.03.14Uploaded bySuparnaBiswas
- DFAUploaded byPraveen Raj
- LectureAll.4upUploaded byPedro César
- Lexical Analyzer Project ReportUploaded byShiv Dass
- CD-Lab-fpp.docUploaded byAnonymous gkHLuF
- System Software Module 3Uploaded byarunlalds
- FAFLUploaded byHariprasad Mutalikdesai
- TE Comp Pune SyllabusUploaded byapi-19806712
- Regular ExpressionUploaded byqwdfgh
- CS_2nd Year 2019-20Uploaded byMunni Lal
- Cs2303 Toc 2marksUploaded bySathya Narayanan
- CSE 2-2 JNTUH Syllabus CopyUploaded byArpan Ladani
- TE Computer Syllabus.pdfUploaded byaniruddhaphatak93
- CD Viva QuestionsUploaded byBhavani Siva
- Theory of ComputationUploaded byRakesh K R
- Recognition of TokensUploaded byVasantha Kumari
- CH2-REGULAREXPRESSIONANDLANGUAGE.docUploaded bykhatreja
- ADUni - Theory of Computation - Problem Set 01 SolutionsUploaded bykiet edu
- Tutorial JflapUploaded byMatt Cristo Vive Morales
- TOC Question BankUploaded byayush
- ITUploaded byBubbyOshin
- III B.Tech CSSE Ist SEMESTER SyllabusUploaded byHarsha Naidu