P. 1
Unit 1

Unit 1

|Views: 45|Likes:
Published by mariajina

More info:

Published by: mariajina on Aug 01, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PPSX, PDF, TXT or read online from Scribd
See more
See less

08/01/2011

pdf

text

original

Language Processors

Jubilant J Kizhakkethottam Assistant Professor Department of Computer Science, SJCET,Pala.

Language Processor ?

Watz it? Natural language…........

Basics
Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level Translator

A program that accepts text expressed in one language and generates semantically equivalent text expressed in another language. Source language The input language of a translator. Target language The output language of a translator.

Basics
Assembler A translator from an assembly language to the corresponding machine language. Compiler A translator from a high level language to a low level language. High-level translator A translator from one high-level language to another.

Object program The output text of an assembler or . Source program The input text of an assembler or compiler. Decompile A translator from a low level language to a high level language.Basics Disassembler A translator from machine language to assembler language.

Basics Implementation language The language in which a program is expressed. Tombstone diagram A graphical representation of the overall function of a system. Cross compiler A compiler which generates code for a machine different from the machine on which it is run. .

Interpreter A program expressed in one language which executes programs expressed in another language.  Editor A program allowing text to be entered and changed. .

Terms interpreter A program expressed in one language which executes programs expressed in another language. editor A program allowing text to be entered and changed. translator A program that accepts text expressed in one language and generates semantically equivalent text expressed in another .

Eg: A1 66 03 01 D1 74 02 .Machine Language Each CPU has its own machine language.

"SUB".TOTAL JZ SKIP Assembler languages use mnemonics for instruction opcodes. [366] ADD CX. such as "ADD". Assembler languages allow . "MOV".Assembly Language Assembly Language Eg: MOV AX. DX JZ 109 MOV AX.

High level language High level language example: if (Total == 0) Daily = Weekly / 7. High level languages support the use of complex expressions. High level languages are not system specific: the same high level program can be run on many different systems. Example: x + y * z / (w + 1) .

in the form of named subprograms. High level languages support the use of control structures. ensuring that the types of operands and results are compatible.1. hiding the implementation details of operations. High level languages support the use of data abstraction. . } High level languages support the use of control abstraction. X = X . # Example: while (X > 0) { Total = Total + Item [X].High level language High level languages enforce type rules.

Comparison Object Oriented Languages Logic Languages Functional Languages Imperitive Languages Report Generators Low Level Languages Assembler Languages Machine Languages Symbolic Symbolic Symbolic Symbolic Symbolic Symbolic Numeric Smalltalk. COBOL. SQL Lisp.TASM Pentium High Level Languages . DYL260 MASM. C RPG. Scheme FORTRAN. Java Prolog. C+ +.

Compiler Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level .

Phases of a Compiler…….. Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level .

Normally a lexical analyzer doesn’t return a list of tokens at one shot. source program Lexical Analyzer token get next token Parser 16 16 . it returns a token when the parser asks a token from it.Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens.

Identifier represents a set of strings which start with a letter continues with letters and digits The actual string is called as lexeme. For identifiers. delimeter. and the symbol table holds the actual attributes for that token. a token may have a single attribute which holds the required information for that token. number. this attribute a pointer to the symbol table. Tokens: identifier. additional information should be held for that specific lexeme. … Since a token can represent more than one lexeme.Token Token represents a set of strings described by a pattern. 17 17 . This additional information is called as the attribute of the token. For simplicity.

Example Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level .

Example (cont) .

Syntax Analyzer is also known as parser. 20 20 A context-free grammar gives a precise syntactic specification of a programming language. Otherwise the parser gives the error messages. .Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. the parser creates the parse tree of that program. If it satisfies. We will use BNF (Backus-Naur Form) notation in the description of CFGs. The syntax of a programming is described by a contextfree grammar (CFG). The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a contextfree grammar or not.

• source program Lexical Analyzer token get next token Parser parse tree 21 21 .Parser • Parser works on a stream of tokens. The smallest item is a token.

1.Parsers (cont. starting from the leaves 22 22 Both top-down and bottom-up parsers scan .) We categorize the parsers into two groups: Top-Down Parser the parse tree is created top to bottom. starting from the root. 1. Bottom-Up Parser the parse is created bottom to top.

A parse tree can be seen as a graphical representation of a derivation. E  -E E E  -(E) ( E E E E )  -(E+E) ( E E E ( E id E + ) E id E E E + ) E -  -(id+E) E ( E id E + ) E -  -(id+id) 23 23 .Parse Tree Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols.

In a context-free grammar. we have: A finite set of terminals (in our case.Context-Free Grammars Inherently recursive structures of a programming language are defined by a context-free grammar. this will be the set of tokens) A finite set of non-terminals (syntacticvariables) A finite set of productions rules in the following form 24 24 ● A where A is a non-terminal and .

we have to have a production rule EE+E in our grammar. In general a derivation step is A   if there is a production rule A in our grammar * + 25 25 .Derivations E  E+E E+E derives from E we can replace E by E+E to able to do this. E  E+E  id+E  id+id A sequence of replacements of non-terminal symbols is called a derivation of id+id from E.

We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverse order. 26 26 .Left-Most and Right-Most Derivations Left-Most Derivation lm lm lm lm lm E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) rm Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) rm rm rm rm We will see that the top-down parsers try to find the left-most derivation of the given source program.

27 27 CS416 Compilr Design . If we always choose the right-most non-terminal in each derivation step.Derivation Example E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) OR E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) At each derivation step. If we always choose the left-most non-terminal in each derivation step. this derivation is called as left-most derivation. this derivation is called as right-most derivation. we can choose any of the non-terminal in the sentential form of G for the replacement.

Ambiguity • A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. E E  E+E  id+E  id+E*E  id+id*E  id+id*id E id + E id E * E id E  E*E  E+E*E  id+E*E  id+id*E  id+id*id E id E E + * E id E id 28 28 .

An unambiguous grammar should be written 29 29 .Ambiguity (cont. the grammar must be unambiguous.) For the most parsers. unambiguous grammar  unique selection of the parse tree for a sentence We should eliminate the ambiguity in the grammar during the design phase of the compiler.

30 30 . So. + A  A for some string  Top-down parsing techniques cannot handle left-recursive grammars. The left-recursion may appear in a single step of the derivation (immediate left-recursion). or may appear in more than one step of the derivation. we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive.Left Recursion A grammar is left recursive if it has a nonterminal A such that there is a derivation.

we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive. + A  A for some string  Top-down parsing techniques cannot handle left-recursive grammars. So. The left-recursion may appear in a single step of the derivation (immediate left-recursion).Left Recursion A grammar is left recursive if it has a nonterminal A such that there is a derivation. 31 31 . or may appear in more than one step of the derivation.

Recursive-Descent Parsing (uses Backtracking) Backtracking is needed. S  aBc B  bc | b S input: abc a B c 32 32 S fails. It tries to find the left-most derivation. backtrack a B c .

Top-down parser Recursive-Descent Parsing ● Backtracking is needed (If a choice of a production rule does not work. Not efficient ● ● Predictive Parsing ● no backtracking efficient 33 33 ● .) It is a general parsing technique.Top-Down Parsing The parse tree is created top to bottom. but not widely used. we backtrack to try other alternatives.

It tries to find the left-most derivation. backtrack a B c .Recursive-Descent Parsing (uses Backtracking) Backtracking is needed. S  aBc B  bc | b S input: abc a B c 34 34 S fails.

A bottom-up parser tries to find the rightmost derivation of the given input in the reverse order.. S  .Bottom-Up Parsing A bottom-up parser creates the parse tree of the given input starting from leaves towards the root..   (the right-most derivation of )  (the bottom-up parser finds the right-most derivation in the reverse order) 35 35 Bottom-up parsing has shift-reduce parsing .

a string  the starting symbol reduced to At each reduction step. a substring of the input matching * to the right side of a production rule is replaced by the rm non-terminal at the left side of that production rule.Shift-Reduce Parsing A shift-reduce parser tries to reduce the given input string into the starting symbol. If the substring is chosen correctly. the right most rm rm derivation of that string is created in the reverse order. 36 36 Rightmost Derivation: S .

Shift-Reduce Parsing -Example string: aaabb S  aABb input A  aA B  bB | a aABb S rm rm rm rm aaAbb aAbb  reduction | b S  aABb  aAbb  aaAbb  aaabb 37 37 Right Sentential Forms .

Handle Informally. But not every substring matches the right side of a production rule is handle A handle of a right sentential form  ( * rm   is rm )  38 38 a production rule A   and a position of where the string  may be found and replaced by A to produce . a handle of a string is a substring that matches the right side of a production rule.

and replace  n in by An to get  n-1. 39 39 Then find a handle An-1 n-1 in  n-1. rm rm rm rm rm S=   .Handle Pruning A right-most derivation in reverse can be obtained by handle-pruning. and replace  n-1 in .. find a handle Ann in n.    0 1 2 n-1 n= input string Start from n..

A Shift-Reduce Parser E  E+T | T T  T*F | F Right-Most Derivation of id+id*id E  E+T  E+T*F  E+T*id  E+F*id  E+id*id  T+id*id  F  (E) | id F+id*id  id+id*id Right-Most Sentential Form Reducing Production id+id*id F+id*id T+id*id E+id*id E+F*id 40 40 F  id T  F E  T F  id T  F F  id T  T*F E+T*id E+T*F .

4. 3. Accept: Successful completion of parsing.A Stack Implementation of A Shift-Reduce Parser There are four possible actions of a shiftparser action: 1. Shift : The next input symbol is shifted onto the top of the stack. 2. Error: Parser discovers a syntax error. and calls an error recovery routine. Reduce: Replace the handle on the top of the stack by the non-terminal. 41 41 .

but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator.Intermediate Code Generation 42 42 Intermediate codes are machine independent codes. Intermediate language can be many different languages. and the designer of the compiler decides .

But we may also the following notation for quadraples (much better notation because it looks like a machine code instruction) 43 43 op y. constants or compiler-generated temporaries.Three-Address Code (Quadraples) A quadraple is: x := y op z where x. y and z are names.x .z. op is any operator.

result or result := 44 44 where op is a unary arithmetic or logical . This binary operator is applied to y and z. and the result of the operation is stored in result.c a..b.result result := y op z or where op is a binary arithmetic or logical operator. Ex: gt add a.z.b.c addi a.Three-Address Statements Binary Operator: op y.b.c Unary Operator: op y op y.b.c addr a.

width is the width of each array element.Arrays Elements of arrays can be accessed quickly if the elements are stored in a block of consecutive locations. A one-dimensional array A: … … baseA low i width baseA is the address of the first location of the array A. 45 low is the index of the first array element 45 .

(For example. Static Data Stack Locations of static data can also be determined at compile time. Data objects allocated at run-time. malloc area in C). Heap 46 46 .Run-Time Storage Memory Organization locations for code are Code determined at compile time. (Activation Records) Other dynamically allocated data objects at run-time.

An activation record is allocated when a procedure is entered. its size is determined at the run time. Except that if the procedure has a local variable and its size depends on a parameter.Activation Records Information needed by a single execution of a procedure is managed using a contiguous block of storage called activation record. Size of each field can be determined at compile time (Although actual location of the activation record is determined at run-time). and it is de-allocated when that procedure exited. 47 47 .

Temporay variables is stored in the field of temporarie . The optional access link is used to refer to nonlocal da held in other activation records.) return value actual parameters optional control link optional access link saved machine status local data temporaries 48 48 The returned value of the called procedure is returned in this field to the calling procedure. we ma use a machine register for the return value.. In practice. The field for actual parameters is used by the calling procedure to supply parameters to the called procedur The optional control link points to the activation record of the caller. The field for saved machine status holds information a the state of the machine before the procedure is called The field of local data holds data that local to an execu of a procedure.Activation Records (cont.

Parameter Passing Methods Call-by-value Call-by-reference Call-by-name (used by Algol) 49 49 .

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->