You are on page 1of 71

Pune Vidyarthi Griha’s

COLLEGE OF ENGINEERING, NASHIK.

“ LANGUAGE TRANSLATOR ”

By
Prof. Anand N. Gharu
(Assistant Professor)
PVGCOE Computer Dept.

22nd Jan 2018
.

CONTENTS :-
1. Role of lexical analysis
2. Parsing, token, pattern, lexemes lex. Error
3. Regular def. for language construct & string
4. Sequences, comments & transition diagram for
recognition of tokens, reserved word & ident.
5. Introduction to Compiler & Interpreters
6. General model of Compiler
7. Compare compiler and interpreter
8. Use of interpreter & component of interpreter
9. Overview of Lex & YACC Specifications.

What’s a compiler?
• All computers only understand machine language

This is
a program

10000010010110100100101……

• Therefore, high-level language instructions must be translated
into machine language prior to execution

3

c if (c!='x') Congrats! printf("You Loser!"). } prog Compiler 10000010010110100100101…… gcc -o prog program. What’s a compiler? • Compiler A piece of system software that translates high-level languages into machine language while (c!='x') { if (c == 'a' || c == 'e' || c == 'i') printf("Congrats!"). else program.c 4 .

Database . Compiler M/C Lang. Prog. Prog. Compiler • Complier:- • These are the system programs which will automatically translate the High level language program in to the machine language program Source program Target program / High level Lang.

in to the machine language program compatible with M/C A Source program Target program / Assembly Lang. Cross Assembler Compatible with Prog. Types of Compiler • Cross Assembler:- • These are the system programs which will automatically translate the Assembly Language program compatible with M/C A. M/C Lang. Prog. Compatible with M/C A M/C A M/C B .

Cross Compiler Compatible with M/C A M/C B . but the underlying M/C is M/C B Source program Target program / M/C Lang. Prog. HLL Prog. in to the machine language program compatible with M/C A . Types of compiler • Cross Compiler:- • These are the system programs which will automatically translate the HLL program compatible with M/C A.

Types of Compiler .

.

It is the language translator which execute source program line by line with out translating them into machine language. . .It does not generate object code. Interpreter .

Visual Basic . C++ .Compiler vs Interpreter .

Phases of compiler .

Structure of Compiler • Any compiler must perform two major tasks Compiler o Analysis ofAnalysis the source program Synthesis o Synthesis of a machine-language program 13 .

Structure of Compiler Source Tokens Syntactic Program Semantic Scanner Parser (Character Structure Routines Stream) Intermediate Representation Symbol and Optimizer Attribute Tables (Used by all Phases of The Compiler) Code Generator 14 Target machine code .

Structure of Compiler Source Program Tokens Syntactic Semantic Scanner Parser (Character Structure Routines Stream) Intermediate Scanner (Lexical Analysis) Representation The scanner begins the analysis of the source program by reading the input. and character Optimizer Attribute grouping characters into individual words and symbols (tokens) Tables (Used by all RE ( Regular expression ) Phases of Code NFA ( Non-deterministic Finite Automata ) The Compiler) DFA ( Deterministic Finite Automata ) Generator LEX 15 Target machine code .Symbol and by character.

Structure of Compiler Source Program Tokens Syntactic Semantic Scanner Parser (Character Structure Routines Stream) Intermediate Parser (Syntax Analysis) Representation Given a formal syntax specification (typically as a context-free grammarSymbol [CFG]and ). the parse reads tokens Optimizer and groups them into Attribute units as specified by the productions of the CFG being used. Phases of Code CFG ( Context-Free Grammar The) Compiler) Generator BNF ( Backus-Naur Form ) 16 GAA ( Grammar Analysis Algorithms ) Target machine code . Tables As syntactic structure is recognized. the parser either (Used by routines calls corresponding semantic all directly or builds a syntax tree.

Structure of Compiler Source Program Tokens Syntactic Semantic Scanner Parser (Character Structure Routines Stream) Intermediate Semantic Routines Representation  Perform two functions Symbol and  Check the static semantics of each construct Optimizer Attribute  Do the actual translation Tables  The heart of a compiler (Used by all Phases Syntax Directed Translation of Code The Compiler) Semantic Processing Techniques Generator IR (Intermediate Representation) 17 Target machine code .

register Phases allocation. of code scheduling Code The Compiler) Generator Register and Temporary Management 18 Peephole Optimization Target machine code . Structure of Compiler Source Program Tokens Syntactic Semantic Scanner Parser (Character Structure Routines Stream) Intermediate Optimizer Representation The IR code generated by the semantic routines is Symbolinto analyzed and transformed andfunctionally equivalent but Optimizer improved IR code Attribute This phase can be very complex and slow Tables Peephole optimization (Used by all loop optimization.

Structure of Compiler Source Program Tokens Syntactic Semantic Scanner Parser (Character Structure Routines Stream) Intermediate Code Generator Representation  Interpretive Code Generation  Generating Code from Tree/Dag Optimizer  Grammar-Based Code Generator Code Generator Target machine code 19 .

Structure of Compiler Code Generator [Intermediate Code Generator] Non-optimized Intermediate Scanner [Lexical Analyzer] Code Tokens Code Optimizer Parser [Syntax Analyzer] Optimized Intermediate Code Parse tree Code Optimizer Semantic Process [Semantic analyzer] Target machine code Abstract Syntax Tree w/ Attributes 20 .

Lexical Analysis .

Syntax Analysis .

Semantic Analysis .

Intermediate Code Generation .

.

.

.

.

.

.

Code Optimization .

Code Generation .

.

.

.

.

.

.

.

.

.

Code Optimization .

.

.

.

.

.

.

Structure of Compiler Compiler writing tools • Compiler generators or compiler- compilers oE.g. Lex 52 . scanner and parser generators oExamples : Yacc.

 Yacc:  Theory. 53 .  Demo.  Example.Overview of Lex & YAAC  Lex:  Theory.  Description.  Execution.  Example.  Lex & Yacc linking.

Lex  lex is a program (generator) that generates lexical analyzers. 54 . (widely used on Unix).  It is mostly used with Yacc parser generator.  It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language.  Lex will read patterns (regular expressions).  Written by Eric Schmidt and Mike Lesk. then produces C code for a lexical analyzer that scans for identifiers.

STRUCTURE OF LEX .

 This pattern matches a string of characters that begins with a single letter followed by zero or more letters or digits. Lex ◦ A simple pattern: letter(letter|digit)*  Regular expressions are translated by lex to a computer program that mimics an FSA. 56 .

Lex cannot be used to recognize nested structures such as parentheses. 57 . Lex  Some limitations. since it only has states and transitions between states. Lex is good at pattern matching. while Yacc is for more challenging tasks.  So.

Lex Pattern Matching Primitives 58 .

59 . Lex • Pattern Matching examples.

C code section (subroutines)……. %% ……….Definitions section…… %% ……Rules section…….. •Echo is an action and predefined macro in lex that writes code matched by the pattern.. Lex ……. • The input structure to Lex. 60 ..

Lex Lex predefined variables. 61 .

.  substitutions in the rules section are surrounded by braces ({letter}) to 62 distinguish them from literals.  Code in the definitions section is simply copied as-is to the top of the generated C file and must be bracketed with “%{“ and “%}” markers. Lex  Whitespace must separate the defining term and the associated expression.

◦ Using stack for storing (LIFO). Yacc  Theory: ◦ Yacc reads the grammar and generate C code for a parser . ◦ e. do reverse operation( reducing the expression) ◦ This known as bottom-up or shift-reduce parsing . ◦ BNF grammar used to express context-free languages .g. 63 . to parse an expression . ◦ Grammars written in Backus Naur Form (BNF) .

STRUCTURE OF YACC .

...... %% . definitions . rules .. Yacc • Input to yacc is divided into three sections.. subroutines ... 65 . %% ....

◦ C code bracketed by “%{“ and “%}”. 66 . Yacc  The definitions section consists of: ◦ token declarations . ◦ the rules section consists of:  BNF grammar .  the subroutines section consists of: ◦ user subroutines .

o Or an identifiers 67 . o product of two expressions .expr | id • Program and expr are nonterminals. • Id are terminals (tokens returned by lex) . yacc& lex in Together • The grammar: program -> program expr | ε expr -> expr + expr | expr . • expression may be : o sum of two expressions .

Lex file 68 .

Yacc file 69 .

Linking lex&yacc 70 .

Thank You Gharu.anand@gmail.com 1/22/2018 71 .