ASSIGNMENT – 01/02 Name Registration No Learning Center Learning Center Code Course Subject Semester Module No Date of Submission Marks

Awarded : : : : : : : : : Ajay Kumar 571013064 GLACE 02815 MC0073 : III

Signature of Evaluator

Signature of Center Coordinator

Directorate of Distance Education Sikkim Manipal University II Floor,Syndicate House Manipal – 576 104

Assignment Set – 1 1. Describe the following with respect to Language Specification: A) Programming Language Grammars B) Classification of Grammars C) Binding and Binding Times Ans: A) Programming Language Grammars The lexical and syntactic features of a programming language are specified by its grammar. This section discusses key concepts and notions from formal language grammars. A language L can be considered to be a collection of valid sentences. Each sentence can be looked upon as a sequence of words and each word as a sequence of letters or graphic symbols acceptable in L. A language specified in this manner is known as & formal language. A formal language grammar is a set of rules which precisely specify the sentences of L. It is clear that natural languages are not formal languages due to their rich vocabulary. However, PLs are formal languages. Terminal symbols, alphabet and strings The alphabet of L, denoted by the Greek symbol , is the collection of symbols in its character set. We will use lower case letters a, b, c, etc. to denote symbols in . A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can be represented using the mathematical notation of a set, e.g. = {a, b,… z, 0, l,… 9} Here the symbols {, ‘,’ and} are part of the notation. We call them metasymbols to differentiate them from terminal symbols. Throughout this discussion we assume that metasymbols are distinct from the terminal symbols. If this is not the case, i.e. if a terminal symbol and a meta symbol are identical, we enclose the terminal symbol in quotes to differentiate it from the meta symbol. For example, the set of punctuation symbols of English can be defined as where ‘,’ denotes the terminal symbol ‘comma’. A string is a finite sequence of symbols. We will represent strings by Greek symbols a, (α, ß, γ etc. Thus α = axy is a string over ∑. The length of a string is the number of symbols in it. Note that the absence of any symbol is also a string, the null string ε. The concatenation operation combines two strings into a single string. It is used to build larger strings from existing strings. Thus, given two strings α and ß, concatenation of α with ß yields a string which is formed by putting the sequence of symbols forming α before the sequence of symbols forming ß. For example, if α = ab, ß = axy, then concatenation of α and ß, represented as α.ß or simply αß, gives the string abaxy. The null string can also participate in a concatenation, thus a.ε =ε.a = a. Nonterminal symbols A nonterminal symbol (NT) is the name of a syntax category of a language, e.g. noun, verb, etc. An NT is written as a single capital letter, or as a name enclosed between <…>, e.g. A or < Noun >. During grammatical analysis, a nonterminal symbol represents an instance of the category. Thus, < Noun > represents a noun. Productions A production, also called a rewriting rule, is a rule of the grammar. A production has the form A nonterminal symbol :: = String of Ts and NTs and defines the fact that the NT on the LHS of the production can be rewritten as the string of Ts and NTs appearing on the RHS. When an NT can be written as one of many different strings, the symbol ‘|’ (standing for ‘or’) is used to separate the strings on the RHS, e.g. < Article > ::- a | an | the

g. derivation in zero or more steps) from N. where S is the distinguished symbol of G. i. A valid string α of lg is obtained by using the following procedure 1. respectively.e. we use the symbol S as the distinguished symbol of G. SNT is the set of NTs. We use the notation N Þη to denote direct derivation of η from N and N Þ η to denote transitive derivation of η (i.The string on the RHS of a production can be a concatenation of component strings. Let α= ‘S’. The string α is a sentence of lg if it consists of only Ts. A parse tree is used to depict the syntactic structure of a valid string as it emerges during a sequence of derivations or reductions. say X. the production < Noun Phrase > ::= < Article >< Noun > expresses the fact that the noun phrase consists of an article followed by a noun. Unless otherwise specified. Definition (Grammar) A grammar G of a language lg is a quadruple where is the alphabet of Lg. Derivation Let production pi of grammar G be of the form * * *A and let be a string such that ß = γAθ. Example: Consider the grammar G < Sentence >::= < Noun Phrase > < Verb Phrase > . 2. to generate valid strings of lg and to ‘recognize’ valid strings of lg. Example Grammar (1. the set of Ts. We can use this notation to define a valid string according to a grammar G as follows: δ is a valid string according to G only if S Þ δ. Example: Derivation of the string the boy according to grammar can be depicted as < Noun Phrase > => < Article > < Noun > => the < Noun > => the boy A string α such that S => α is a sentential form of lg. Thus. Derivation. The derivation operation helps to generate valid strings while the reduction operation helps to recognize valid strings. A =>α only if A : = α is a production of G and A Þ δ if A Þ … Þ δ. then replacement of A by α in string constitutes a derivation according to production p1 . Each grammar G defines a language lg. S is the distinguished symbol. the boy and an apple are some valid strings in the language. reduction and parse trees A grammar G is used for two purposes. e.1) defines a language consisting of noun phrases in English < Noun Phrase > :: = < Article > < Noun > < Article > ::= a | an | the <Noun> ::= boy | apple < Noun Phrase > is the distinguished symbol of the grammar. G contains an NT called the distinguished symbol or the start NT of G. (b) Replace X by a string appearing on the RHS of a production of X.e. and P is the set of productions. While α is not a string of terminal symbols (a) Select an NT appearing in α.

hence they are not relevant to specification of programming languages. only the boy ate an apple is a sentence. B) Classification of Grammars Grammars are classified on the basis of the nature of productions used in them (Chomsky. contain productions of the form where both α and ß can be strings of Ts and NTs. known as phrase structure grammars. A Type-1 production has the form . Such productions permit arbitrary substitution of strings during derivation or reduction. Reduction: To determine the validity of the string Example The boy ate an apple according to grammar we perform the following reductions Step String The boy ate an apple 1 < Article > boy ate an apple 2 < Article > < Noun > ate an apple 3 < Article > < Noun > < Verb > an apple 4 < Article > < Noun > < Verb > < Article > apple 5 < Article > < Noun > < Verb > < Article > < Noun > 6 < Noun Phrase > < Verb > < Article > < Noun > 7 < Noun Phrase > < Verb > < Noun Phrase > 8 < Noun Phrase > < Verb Phrase > 9 < Sentence > The string is a sentence of lg since we are able to construct the reduction sequence the boy ate an apple —> < Sentence >. Type – 1 grammars These grammars are known as context sensitive grammars because their productions specify that derivation or reduction of strings can take place only in specific contexts. 1963). Parse trees A sequence of derivations or reductions reveals the syntactic structure of a string with respect to G. Derivation according to the production A :: = α gives rise to the following elemental parse tree. We depict the syntactic structure in the form of a parse tree.< Noun Phrase >::= < Article >< Noun > < Verb Phrase >::= <verb> <Noun Phrase> <Article> ::= = a | an | the < Noun >::= boy | apple <verb> ::= ate The following strings are sentential forms of Lg < Noun Phrase > < Verb Phrase > the boy < Verb Phrase > < Noun Phrase > ate < Noun Phrase > the boy ate < Noun Phrase > the boy ate an apple However. Type – 0 Grammars These grammars. Each grammar class has its own characteristics and limitations.

Type-3 grammars are also known as linear grammars or regular grammars. Execution init time of proc 5. Type – 3 grammars Type-3 grammars are characterized by productions of the form A::= tB | t or A ::= Bt | t Note that these productions also satisfy the requirements of Type-2 grammars. Type – 2 grammars These grammars impose no context requirements on derivations or reductions. Thus. Compilation time of P 4. The preceding list of binding times is not exhaustive. the compiler designer performs certain bindings. Note that language implementation time is the time when a language translator is designed. a string in a sentential form can be replaced by ‘A’ (or vice versa) only when it is enclosed by the strings . The memory addresses of local variables info and p of procedure proc are bound at every execution init time of procedure proc. Binding of the keywords of Pascal to their meanings is performed at language definition time. The memory address of P↑ is bound when the procedure call new (p) is executed. Operator grammars Definition (Operator grammar (OG)) An operator grammar is a grammar none of whose productions contain two or more consecutive NTs in any RHS alternative. Binding of type attributes of variables is performed at compilation time of program bindings. Where L is a programming language. the size of type ‘integer’ is bound to n bytes where n is a number determined by the architecture of the target machine. The size attribute of type is bound to a value sometime prior to this binding. C) Binding and Binding Times Definition: Binding: A binding is the association of an attribute of a program entity with a value. begin and end get their meanings. Language implementation time of L 3. Binding time is the time at which a binding is performed. The specific form of the RHS alternatives—namely a single T or a string containing a single T and a single NT— gives some practical advantages in scanning. The language definition of L specifies binding times for the attributes of various entities of programs written in L. procedure. These grammars are therefore known as context free grammars (CFG). P is a program written in L and proc is a procedure in P. nonterminals occurring in an RHS string are separated by one or more terminal symbols. Language definition time of L 2. These are further categorized into left-linear and right-linear grammars depending on whether the NT in the RHS alternative appears at the extreme left or extreme right. All terminal symbols occurring in the RHS strings are called operators of the grammar. Thus the type attribute of variable var is bound to type. These grammars are also not particularly relevant for PL specification since recognition of PL constructs is not context sensitive in nature.Thus. viz. other binding times can be defined. A typical Type-2 production is of the form which can be applied independent of its context. Execution time of proc. binding at the linking time of P. This is how keywords like program. CFGs are ideally suited for programming language specification. . We are interested in the following binding times: 1. The value attributes of variables are bound (possibly more than once) during an execution of proc. when its declaration is processed. At language implementation time. For example. These bindings apply to all programs written in Pascal.

which gives them 128 additional characters. ASCII is a code for representing English characters as numbers. . Pronounced ask-ee. A more universal standard is the ISO Latin 1 set of characters.Static and dynamic bindings Definition (Static binding) A static binding is a binding performed before the execution of a program begins. a format for encoding data for storage on a storage medium File format. Several companies and organizations have proposed extensions for these 128 characters. particularly if they contain numeric data. a format for encoding data for storage in a computer file Container format (digital). graphics symbols. Executable programs are never stored in ASCII format. a format for processing audio data • Video format. 2. and mathematical symbols. as well as Web browsers. a format for converting data to information • Audio format. a format for processing video data • • • • • ASCII data formats Acronym for the American Standard Code for Information Interchange. Most computers use ASCII codes to represent text. Text files stored in ASCII format are sometimes called ASCII files. constraint placed upon the interpretation of data in a type system Signal (electrical engineering). a format for encoding data for storage by means of a standardized audio/video codecs file format • Content format. a format for signal data used in signal processing Recording format. For a list of commonly used characters and their ASCII equivalents. 3. Definition (Dynamic binding) A dynamic binding is a binding performed after the execution of a program has begun. The standard ASCII character set uses just 7 bits for each character. What are data formats? Explain ASCII data formats. the ASCII code for uppercase M is 77. Explain the following with respect to the design specifications of an Assembler: A) Data Structures B) pass1 & pass2 Assembler flow chart Ans: A) Data Structure The second step in our design procedure is to establish the databases that we have to work with. although ASCII format is not always the default storage format.? Ans: Data format in information technology can refer to either one of: Data type. Most data files. which makes it possible to transfer data from one computer to another. refer to the ASCII page in the Quick Reference section. are not stored in ASCII format. Text editors and word processors are usually capable of storing data in ASCII format. There are several larger character sets that use 8 bits. For example. The DOS operating system uses a superset of ASCII called extended ASCII or high ASCII. which is used by many operating systems. with each letter assigned a number from 0 to 127. The extra characters are used to represent non-English characters.

Pass 2 Data Structures 1. Input source program 2. 7. The contents of these tables are not filled in or altered during the assembly process. Instead of using two different tables. four. the Pseudo-Operation Table (POT). Format of Data Structures The third step in our design procedure is to specify the format and content of each of the data structures. A table. the literal table (LT) that is used to store each literal encountered and its corresponding assignment location. binary machine opcode and format of instruction. length (two. that indicates which registers are currently specified as base registers by USING pseudo-ops and what the specified contents of these registers are. pass 1 requires only name and length. A table. the Symbol Table (ST). ……. The Machine operation table (MOT) and pseudo-operation table are example of fixed tables. or six bytes) 4. 8. A work space INST that is used to hold each instruction as its various parts are being assembled together. the base table (BT). the Machine-operation Table (MOT) that indicates the symbolic mnemonic. ……. four. Location Counter (LC) 3. used to produce a printed listing. containing each label and corresponding value. binary code and format. used prior to actual outputting for converting the assembled instructions into the format needed by the loader. the Pseudo-Operation Table (POT) that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1. used to keep track of each instruction’s location. A work space. A Location Counter (LC). Copy of source program input to pass1. symbolic mnemonic. 5. length. A Table. ……. prepared by pass1. A table. that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 2. A table. 7. 9. 5. ‘b’ represents “blank” . 3. 2. The following figure depicts the format of the machine-op table (MOT) —————————————– 6 bytes per entry ———————————– Mnemonic Binary OpcodeInstruction Instruction Not used Opcode (1byte) length format here (4bytes) (hexadecimal) (2 bits) (binary) (3bits) (3 bits) characters (binary) “Abbb” 5A 10 001 “Ahbb” 4A 10 001 “ALbb” 5E 10 001 “ALRB” 1E 01 000 ……. the Machine-operation Table (MOT). PUNCH CARD. or six bytes). A table. the Symbol Table (ST) that is used to store each label and its corresponding value. Pass 2 requires a machine operation table (MOT) containing the name. 4.Pass 1 Data Structures 1. A table. 6. PRINT LINE. for each instruction and its length (two. A copy of the input to be used by pass 2. 10. that indicates for each instruction. 6. we construct single (MOT). A work space. An output deck of assembled instructions in the format needed by the loader. A table.

TERM BREG. This representation consists of two main components–data structures. The first pass constructs an intermediate representation (IR) of the source program for use by the second pass. Hence the instruction opcode and address of BREG will be assembled to reside in location 101. the instruction corresponding to the statement MOVER BREG. TERM CREG. The latter component is called intermediate code (IC). N LE. Look at the following instructions: START READ MOVER MOVEM MULT MOVER ADD MOVEM COMP BC MOVEM PRINT STOP DS DS DC PS END 101 N BREG. the symbol table. The address of the forward referenced symbol is put into this field when its definition is encountered. the first pass performs analysis of the source program while the second pass performs synthesis of the target program. TERM CREG. The second pass synthesizes the target form using the address information found in the symbol table. and a processed form of the source program. TERM CREG. RESULT RESULT 1 1 ‘1’ 1 AGAIN 101) 102) 103) 104) 105) 106) 107) 108) 109) 110) 111) 112) 113) 114) 115) 116) + 09 0 113 + 04 2 115 + 05 2 116 + 03 2 116 + 04 3 116 + 01 3 115 + 05 3 116 + 06 3 113 + 07 2 104 + 05 2 114 + 10 0 114 + 00 0 000 N RESULT ONE TERM + 00 0 001 In the above program. The need for inserting the second . Single pass translation LC processing and construction of the symbol table proceed as in two pass translation.g. ONE BREG. The operand field of an instruction containing a forward reference is left blank initially. ONE CREG. e. In effect. AGAIN BREG. ONE can be only partially synthesized since ONE is a forward reference. The problem of forward references is tackled using a process called backpatch-ing. LC processing is performed in the first pass and symbols defined in the program are entered into the symbol table.B) pass1 & pass2 Assembler flow chart Pass Structure of Assemblers Here we discuss two pass and single pass assembly schemes in this section: Two pass translation Two pass translation of an assembly language program can handle forward references easily.

a) Lexical Analsis b) Syntax Analysis. There are two kinds of token: specific strings such as IF or a semicolon. constants. Identifiers. The second aspect of syntax analysis is to make explicit the hierarchical structure of the incoming token stream by identifying which parts of the token stream should be grouped together. The lexical analyzer reads the source program one character at a time. Thus. 4. ONE) in this case. or labels. which is the output of the lexical analyzer. Separate the symbol. Construct intermediate representation. constants. 2. the symbol table would contain the addresses of all symbols defined in the source program and TII would contain information describing all forward references. Ans: Lexical Analysis The lexical analyzer is the interface between the source program and the compiler. It also imposes on the tokens a tree-like structure that is used by the subsequent phases of the compiler. Explain the following. and classes of strings such as identifiers. <symbol>). Each token represents a sequence of characters that can be treated as a single logical entity. Describe the process of Bootstrapping in the context of Linkers ? Ans : In computing. By the time the END statement is processed. It is a solution to the Chicken-and-egg problem of starting a certain system without the system already functioning. Syntax Analysis The parser has two functions. when definition of some symbol symb is encountered. It checks that the tokens appearing in its input. the entry (101. 3. bootstrapping refers to a process where a simple system activates another more complicated system that serves the same purpose. and punctuation symbols such as commas and parentheses are typical tokens. all forward references to symb can be processed. This entry is a pair (instruction address>. For example. The assembler can now process each entry in TII to complete the concerned instruction. keywords. The term is most often applied to the . e. Design of A Two Pass Assembler Tasks performed by the passes of a two pass assembler are as follows: Pass I: 1. (101. Build the symbol table.g. Pass II: Synthesize the target program. 4. carving the source program into a sequence of atomic units called tokens. Alternatively. occur in patterns that are permitted by the specification for the source language. Perform LC processing. The design details of assembler passes are discussed after introducing advanced assembler directives and their influence on LC processing. entries in TII can be processed in an incremental manner. mnemonic opcode and operand fields. operators. ONE) would be processed by obtaining the address of ONE from symbol table and inserting it in the operand address field of the instruction with assembled address 101.operand’s address at a later stage can be indicated by adding an entry to the Table of Incomplete Instructions (TII). Pass I performs analysis of the source program and synthesis of the intermediate representation while Pass II processes the intermediate representation to synthesize the target program. 5.

e. as in "pulling one’s self up by the bootstraps. the reset sequence jumps to the address 16 bytes below the top of the system’s address space. address large amounts of memory (on an x86 the loader usually runs in real mode which means that it’s tricky to address more than 1MB of memory. The simplest environment will be. perhaps. On IBMcompatible x86 systems. Some systems make this quite easy (just stick the name of your program in AUTOEXEC. The tiny program executes a system call that runs /etc/init. and then proceed to run user-level programs. into that process.g. others make it nearly impossible. but the sequence of increasingly capable loaders remains. it sets its registers to a known state.) Why not just load the operating system directly? Because you can’t fit an operating system loader into 512 bytes.g. The operating system loader contains more sophisticated code that can read and interpret a configuration file. The bootstrap ROM occupies the top 64K of the address space and ROM code then starts up the computer. and a simple compiler .) The full operating system can turn on the virtual memory system.. or if that fails the first block of the first hard disk. The kernel creates a process. None of this matters much to the application level programmer. one can write a more complex text editor. For example. in which a mechanism is needed to execute the software program that is responsible for executing software programs (the operating system). so the obvious question is how is the first program loaded into the computer? In modern computers. ed) and an assembler program. It also presents opportunities for customized systems. but it becomes more interesting if you want to write programs that run on the bare hardware of the machine. a single-application system could be built over a Unix kernel by naming the application /etc/init. the boot ROM code reads the first block of the floppy disk into memory. On x86 systems. since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system. Many Unix systems use a similar bootstrap process to get user-mode programs running. Bootstrap loading The discussions of loading up to this point have all presumed that there’s already an operating system or at least a program loader resident in the computer to load the program of interest. The first level loader typically is only able to load a single-segment program from a file with a fixed name in the top-level directory of the boot disk. for example. then stuffs a tiny little program. the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs. loads the drivers it needs. only a few dozen bytes long. Software Bootstraping & Compiler Bootstraping Bootstrapping can also refer to the development of successively more complex. The program in block zero in turn loads a slightly larger operating system boot program from a known place on the disk into memory. (There can be even more steps. uncompress a compressed operating system executable. for example). a very basic text editor (e. into memory location zero and jumps to location zero. the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. The chain of programs being loaded by other programs has to start somewhere.BAT and reboot Windows 95. faster programming environments." When the CPU is powered on or reset. Using these tools. and jumps to that program which in turn loads in the operating system and starts it.process of starting up a computer. a boot manager that decides from which disk partition to read the operating system boot program.

It also makes an entry for this instruction in RELOCTAB so that the linker would put the appropriate address in the operand field. it does not completely eliminate the need for relocation. MOV AX. it makes provision to load the higher order 16 bits of the address of DATA_HERE into the AX register. consider the following program : . a reference to A is assembled as a displacement of 196 from the contents of the CS register. Relocation is somewhat more involved in the case of intra-segment jumps assembled in the FAR format. Since the assembler knows DATA_HERE to be a segment. The ASSUME statement declares the segment registers CS and DS to the available for memory addressing. The reference to B is assembled as a displacement of 0002 from the contents of the DS register. However it does not know the link time address of DATA_HERE. which is the correct address 2196. The effective operand address would be calculated as <CS>+0196. Since the DS register would be loaded with the execution time address of DATA_HERE. Compiler Bootstraping In compiler design. or a subset of the language. Consider statement 14 . PL/I and more recently the Mono C# compiler. BASIC. hence it assembles the MOV instruction in the immediate operand format and puts zeroes in the operand field. hence the instruction is not address sensitive. Though use of segment register reduces the relocation requirements. that it compiles. 6. a bootstrap or bootstrapping compiler is a compiler that is written in the target language. Now no relocation is needed if segment SAMPLE is to be loaded with address 2000 by a calling program (or by the OS). Hence all memory addressing is performed by using suitable displacements from their contents. the reference to B would be automatically relocated to the correct address. In statement 16. Inter-segment calls and jumps are handled in a similar way. A similar situation exists with the reference to B in statement 17. This avoids the use of an absolute address. Translation time address o A is 0196.for a higher-level language and so on. Describe the procedure for design of a Linker ? Ans : Design of a linker Relocation and linking requirements in segmented addressing The relocation requirements of a program are influenced by the addressing structure of the computer system on which it is to execute. For example. Implementation Examples: A Linker for MS-DOS Example: Consider the program of written in the assembly language of intel 8088. DATA_HERE Which loads the segment base of DATA_HERE into the AX register preparatory to its transfer into the DS register . OCaml. GHC. Use of the segmented addressing structure reduces the relocation requirements of program. until one can have a graphical IDE and an extremely high-level programming language. Examples include gcc.

. Hence there is no reduction in the linking requirements. however both segment base address and offset of the external symbol must be computed by the linker. A segment like ADDR_A DW OFFSET A (which is an ‘address constant’) does not need any relocation since the assemble can itself put the required offset in the bytes. The assembler puts the displacement of FAR_LAB in the first two operand bytes of the instruction . A FAR jump Here the displacement and the segment base of FAR_LAB are to be put in the JMP instruction itself.FAR_LAB EQU THIS FAR . In summary. and makes a RELOCTAB entry for the third and fourth operand bytes which are to hold the segment base address. FAR_LAB is a FAR label JMP FAR_LAB . the only RELOCATAB entries that must exist for a program using segmented memory addressing are for the bytes that contain a segment base address. For linking.

Write about Deterministic and Non-Deterministic Finite Automata with suitable numerical examples ? Ans: Deterministic finite automata Definition: Deterministic finite automata (DFA) A deterministic finite automaton (DFA) is a 5-tuple: (S. and the video controller is given direct access to the frame buffer memory.Assignment Set – 2 1. A) · an alphabet (Σ) · a set of states (S) · a transition function (T : S × Σ → S). A DFA or NFA can easily be converted into a GNFA and then the GNFA can be easily converted into a regular expression by reducing the number of states until S = {s. 2. It uses the transition relation T to determine the next state(s) using the current state and the symbol just read or the empty string. otherwise it is said to reject the string. when it has finished reading. The set of strings it accepts form a language. when it has finished reading. a}. s. it is said to accept the string. T. The machine starts in the start state and reads in a string of symbols from its alphabet. it is in an accepting state. A fixed area of the system is reserved for the frame buffer. it is said to accept the string. Σ. Writer short notes on video controller ? Ans : Video controller is used to control the operation of the display device . Non-Deterministic Finite Automaton (N-DFA) A Non-Deterministic Finite Automaton (NFA) is a 5-tuple: (S. The set of strings it accepts form a language. Σ. which is the language the DFA recognizes. A) · an alphabet (Σ) · a set of states (S) · a transition function (T: S × Σ → S). If. otherwise it is said to reject the string. · a start state (s S) · a set of accept states (A S) The machine starts in the start state and reads in a string of symbols from its alphabet. . It uses the transition function T to determine the next state using the current state and the symbol just read. s. T. which is the language the NFA recognizes. · a start state (s S) · a set of accept states (A S) Where P(S) is the power set of S and ε is the empty string. If. it is in an accepting state.

B) Conditional Assembly . A 1 in the input does not change the state of the automaton. 0) = S2 o T(S1. you can include or exclude parts of the program according to various conditions. which determines if the input contains an even number of 0s. and defined as follows: o T(S1. You can define macros. s. 0) = S1 o T(S2. S2} · s = S1 · A = {S1} The transition function T is visualized by the directed graph shown on the right. There are two main methods for handling where to generate the outputs for a finite state machine. If you use a program to combine or rearrange source files into an intermediate file which is then compiled. T. which are brief abbreviations for longer constructs. 1} · S = {S1. you can use line control to inform the compiler of where each source line originally came from. The C preprocessor provides four separate facilities that you can use as you see fit: · Inclusion of header files. while S2 signifies an odd number. 3. · Conditional compilation. 1) = S2 Simply put. They are called a Moore Machine and a Mearly Machine. 1) = S1 o T(S2. These are files of declarations that can be substituted into your program. Using special preprocessing directives. See the following figure M = (S. which are abbreviations for arbitrary fragments of C code. It is called a macro processor because it allows you to define macros. named after their respective authors. · Macro expansion.Deterministic Finite State Machine The following example explains a deterministic finite state machine (M) with a binary alphabet. Σ. and then the C preprocessor will replace the macros with their definitions throughout the program. When the input ends. the state will show whether the input contained an even number of 0s or not. · Line control. A) · Σ = {0. Write a short note on: A) C Preprocessor for GCC version 2 Ans: The C Preprocessor for GCC version 2 The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation. the state S1 represents that there has been an even number of 0s in the input so far.

Conditional Assembly : Means that some sections of the program may be optional. This process is so complex that it is not reasonable. either included or not in the final program. Here is a conditional statements in C programming. The first phase. operator symbols such as < = or +. where `BUFSIZE’ must be a macro. one that prints debugging information during test executions for the developer. The usual tokens are keywords. or scanner. #endif /* BUFSIZE is large */ 4. separates characters of the source language into groups that logically belong together. either from a logical point of view or from an implementation point of view. another version for production operation that displays only results of interest for the average user. Strictly speaking. but in practice the consequences of having strict ANSI Standard C make it undesirable to do this. you must use the options `-trigraphs’. A phase is a logically cohesive operation that takes as input one representation of the source program and produces as output another representation. Such incompatibility would be inconvenient for users. Note that true is any non-zero value. A program fragment that assembles the instructions to print the Ax register only if Debug is true is given below. so the GNU C preprocessor is configured to accept these constructs by default. to get ANSI Standard C. For this reason. Write about different Phases of Compilation. such as X or NUM. the following statements tests the expression `BUFSIZE == 1020′.ANSI Standard C requires the rejection of many harmless constructs commonly used by today’s C programs. it is customary to partition the compilation process into a series of sub processes called phases. and . `-undef’ and `-pedantic’. A reasonable use of conditional assembly would be to combine two versions of a program. ? Ans: Phases of Compiler A compiler takes as input a source program and produces as output an equivalent sequence of machine instructions. #if BUFSIZE == 1020 printf ("Large buffers!n"). such as DO or IF identifiers. dependent upon specified conditions. to consider the compilation process as occurring in one single step. these groups are called tokens. called the lexical analyzer.

Code Optimization is an optional phase designed to improve the intermediate code so that the ultimate object program runs faster and / or takes less space. Designing a code generator that produces truly efficient object programs is one of the most difficult parts of compiler design. The output of the lexical analyzer is a stream of tokens. The final phase. which is passes to the next phase. portion of the compiler keeps track of the names used by the program and records essential information about each. carving the source program into a sequence of atomic units called tokens. and “identifier” by 3. Expressions might further be combined to form statements. selecting code to access each datum. code generation. The lexical analyzer reads the source program one character at a time. Its output is another intermediate code program that does the same job as the original. or parser. + by 2. One common style uses instructions with one operator and a small number of operands. Both the table management and error handling routines interact with all phases of the compiler. The syntax analyzer groups tokens together into syntactic structures. constants. Each token represents a sequence of characters that can be treated as a single logical entity. The interior nodes of the tree represent strings of tokens that logically belong together. or bookkeeping. DO might be represented by 1. These instructions can be viewed as simple macros like the macro ADD2. the three tokens representing A + B might be grouped into a syntactic structure called an expression. so that as many errors as possible can be detected in one compilation. The tokens in this stream can be represented by codes which we may regard as integers. Thus. keywords. Often the syntactic structure can be regarded as a tree whose leaves are the tokens. is passed along with the integer code for “identifier”. etc). a second quantity. and adjust the information being passed from phase to phase so that each phase can proceed. such as its type (integer. and selecting the registers in which each computation is to be done. It is desirable that compilation be completed on flawed programs. and punctuation symbols such as . The Error Handler is invoked when a flaw in the source program is detected. Identifiers.punctuation symbols such as parentheses or commas. real. In the case of a token like ‘identifier”. telling which of those identifiers used by the program is represented by this instance of token “identifier”. Lexical Analysis The lexical analyzer is the interface between the source program and the compiler. the syntax analyzer. The primary difference between intermediate code and assembly code is that the intermediate code need not specify the registers to be used for each operation. The Table-Management. but perhaps in a way that saves time and / or space. Many styles of intermediate code are possible. both practically and theoretically. operators. It must warn the programmer by issuing a diagnostic. The data structure used to record this information is called a Symbol table. For example. at least through the syntax-analysis phase. The intermediate code generator uses the structure produced by the syntax analyzer to create a stream of simple instructions. produces the object code by deciding on the memory locations for data.

commas and parentheses are typical tokens. There are two kinds of token: specific strings such as IF or a semicolon. Code Optimization Object programs that are frequently executed should be fast and small. The second aspect of syntax analysis is to make explicit the hierarchical structure of the incoming token stream by identifying which parts of the token stream should be grouped together. It checks that the tokens appearing in its input. It also imposes on the tokens a tree-like structure that is used by the subsequent phases of the compiler. A good optimizing compiler can improve the target program by perhaps a factor of two in overall speed. A typical three-address code statement is A: = B op C Where A. which is the output of the lexical analyzer. in an attempt to produce an intermediate-language version of the source program from which a faster or smaller object-language program can ultimately be produced. or labels. Certain compilers have within them a phase that tries to apply transformations to the output of the intermediate code generator. This phase is popularity called the optimization phase. The intermediate code generation phase transforms this parse tree into an intermediate language representation of the source program called Three-Address Code. There are two types of optimizations used: · Local Optimization · Loop Optimization Code Generation The code generation phase converts the intermediate code into a sequence of machine instructions. Syntax Analysis The parser has two functions. and classes of strings such as identifiers. constants. in comparison with a compiler that generates code carefully but without using specialized techniques generally referred to as code optimization. Three-Address Code One popular type of intermediate language is what is called “three-address code”. . A simple-minded code generator might map the statement A: = B+C into the machine code sequence. B and C are operands and op is a binary operator. Intermediate Code Generation On a logical level the output of the syntax analyzer is some representation of a parse tree. occur in patterns that are permitted by the specification for the source language.

the macro has the effect of automating writing of the program. is particularly difficult to do optimally. Knowing what quantities reside in registers. called register allocation. Each use of a macro generates new program instructions. A good code generator would therefore attempt to utilize these registers as efficiently as possible. Macros allow a programmer to define pseudo operations. This aspect of code generation. and can be implemented as a sequence of instructions. typically operations that are generally desirable. Many computers have only a few high speed registers in which computations can be performed particularly quickly. The use of macro name with set of actual parameters is replaced by some code generated by its body. This is called macro expansion. To avoid these redundant loads and stores. are not implemented as part of the processor instruction. 5.LOAD B ADD C STORE A However. Ans: Macro definition and Expansion Definition : macro A macro name is an abbreviation. set of formal parameters and body of code. a code generator might keep track of the run-time contents of registers. A macro consists of name. such a straightforward macro like expansion of intermediate code into machine code usually produces a target program that contains many redundant loads and stores and that utilizes the resources of the target machine inefficiently. Macros are useful for the following purposes: · To simplify and reduce the amount of repetitive coding · To reduce errors caused by repetitive coding · To make an assembly program more readable. . which stands for some related lines of code. What is MACRO? Discuss its Expansion in detail with the suitable example. the code generator can generate loads and stores only when necessary.

y).Macros are commonly used in C to define small snippets of code. A macro call leads to macro expansion. thus. like C. C++ etc. Example . If the macro has parameters.1 Macro expansion on a source program. it is also slow. such as C or assembly language. While this use of macros is very important for C. For instance. and may lead to a number of pitfalls. as well as expanding into arbitrary text (although the C compiler will require that text to be valid C source code. b) a>b? A: b Defines the macro max. Macros are similar to functions in that they can take arguments and in that they are calls to lengthier sets of instructions. since the macro itself has no address. a name that defines a set of commands that are substituted for the macro name wherever the name appears in a program (a process called macro expansion) when the program is compiled or assembled. creating new syntax within some limitations. Macro Expansion. rather inefficient.Macros can be defined used in many programming languages. function instructions are copied into a program only once. In programming languages. for instance. #define max (a. the macro statement is replaced by sequence of assembly statements. taking two arguments a and b. for instance to define type-safe generic data-types or debugging tools. they are substituted into the macro body during expansion. The usual reason for doing this is to avoid the overhead of a function call in simple cases. Therefore. C macros are capable of mimicking functions. Unlike functions. or else comments). Macros which mimic functions. Example macro in C programming. macros are replaced by the actual commands they represent when the program is prepared for execution. Figure 1. can be called like real functions. but they have some limitations as a programming construct. after preprocessing z = max(x. but a macro cannot be passed to another function using a function pointer. During macro expansion. a C macro can mimic a C function. using identical syntax. Becomes z = x>y? X:y. This macro may be called like any C function. where the code is lightweight enough that function call overhead has a significant impact on performance.

is the process where object modules are connected together (linked) to form one large program. It provides a variety of benefits that are hard to get otherwise: • Dynamically linked shared libraries are easier to create than static linked shared libraries.when ever a macro is called the entire is code is substituted in the program where it is called. and create one . x=4. That is. and printreport. you would compile them separately into myprogram. y.exe file.obj. x=4. z = max(x.b) a>b?a:b Main () { int x .b) a>b?a:b main() { int x . i.obj. #define max(a.c. Defines the macro max.z = x>y?x:y.What is linking? Explain dynamic linking in detail. along with any library code that needs to be included. and printreport. searchfunction. } 6 . INITZ. resolving external variables and function calls. taking two arguments a and b. if you have three source code files called myprogram. y=6. Macro calling in high level programming languages (C programming) #define max(a. .obj. y=6. Every macro begins with MACRO keyword at the beginning and ends with the ENDM (end macro).c. the whole code would appear like this.In the above program a macro call is shown in the middle of the figure. Dynamic Linking: Dynamic linking defers much of the linking process until a program starts running. searchfunction. The linker with take the object modules.e. after preprocessing Becomes z = x>y ? x: y.? Ans: Linking : Linking. } The above program was written using C programing statements. Therefore. So the resultant of the macro code is shown on the right most side of the figure. y). Which is called during program execution. This macro may be called like any C function. y. After macro expansion. using identical syntax.c.

(Windows DLLs mitigate this cost somewhat. Every dynamically linked symbol used in a program has to be looked up in a symbol table and resolved. . Dynamic linking permits a program to load and unload routines at runtine.• • • Dynamically linked shared libraries are easier to update than static linked shared libraries. a chronic source of problems is changes in library semantics. Since dynamic shared libraries are so easy to update compared to unshared or static shared libraries. Wellbehaved applications pop up a warning before installing an older library over a newer one. The semantics of dynamically linked shared libraries can be much closer to those of unshared libraries. but even so. and installers often will inadvertently install an older version of a shared library on top of a newer one. as we describe below. where programs use a lot of shared libraries. since a large part of the linking process has to be redone every time a program runs.) Dynamic libraries are also larger than static libraries. of course. programs that depend on semantics of older libraries have been known to break when newer versions replace the older ones. There are a few disadvantages. which means that the behavior of those programs changes even though "nothing has changed". and library version control is not very sophisticated. a facility that can otherwise be very difficult to provide. Beyond issues of call compatibility. libraries go through a lot of versions. This is a frequent source of problems on Microsoft Windows. since the dynamic ones have to include symbol tables. The runtime performance costs of dynamic linking are substantial compared to those of static linking. breaking programs that are expecting features found in the newer one. Most programs ship with copies of all of the libraries they use. it's easy to change libraries that are in use by existing programs.

Sign up to vote on this title
UsefulNot useful