This action might not be possible to undo. Are you sure you want to continue?
Explain the following. Lexical Analysis
b. Syntax Analysis
The lexical analyzer is the interface between the source program and the compiler. The lexical analyzer reads the source program one character at a time, carving the source program into a sequence of atomic units called tokens. Each token represents a sequence of characters that can be treated as a single logical entity. Identifiers, keywords, constants, operators, and punctuation symbols such as commas and parentheses are typical tokens. There are two kinds of token: specific strings such as IF or a semicolon, and classes of strings such as identifiers, constants, or labels.
The lexical analyzer and the following phase, the syntax analyzer are often grouped together into the same pass. In that pass, the lexical analyzer operates either under the control of the parser or a co-routine with the parser. The parser asks the lexical analyzer returns to the parser a code for the token that it found. In the case that the token is an identifier or another token with a value, the value is also passed to the parser. The usual method of providing this information is for the lexical analyzer to call a bookkeeping routine which installs the actual value in the symbol table if it is not already there. The lexical analyzer then passes the two components of the token to the parser. The first is a code for the token type (identifier), and the second is the value, a pointer to the place in the symbol table reserved for the specific value found.
b. Syntax Analysis
The parser has two functions. It checks that the tokens appearing in its input, which is the output of the lexical analyzer, occur in patterns that are permitted by the specification for the source language. It also imposes on the tokens a
MIPS. PA-RISC. PIC. The Reduced Instruction Set Computer. is a microprocessor CPU design philosophy that favors a simpler set of instructions that all take about the same amount of time to execute. 2. The most common RISC microprocessors are AVR.tree-like structure that is used subsequent phases of the compiler. by the The second aspect of syntax analysis is to make explicit the hierarchical structure of the incoming token stream by identifying which parts of the token stream should be grouped together. DEC Alpha. What is RISC and how it is different from the CISC ? Ans.Small number of machine instructions : less than 150 . ARM. · RISC characteristics . and IBM·s PowerPC. or RISC. SPARC.
complex instructions Memory-tomemory: reduced instruction only Register to register: .Load / Store architecture .Instructions of the same length : 32 bits (or 64 bits) .Small number of instruction formats : less than 4 . RISC VS CISC CISC Emphasis on hardware RISC Emphasis on software Includes multi-clock Single-clock..Small number of addressing modes : less than 4 .Single cycle execution .Hardwired control .Large number of GRPs (General Purpose Registers): more than 32 .Support for HLL (High Level Language).
A) Data Structures ."LOAD" and "STORE" incorporated in instructions Small code sizes. "LOAD" and "STORE" are independent instructions Low cycles per second. Explain the following with respect to the design specifications of an Assembler: A) Data Structures B) pass1 & pass2 Assembler flow chart Ans. high cycles per second Transistors used for storing complex instructions large code sizes Spends transistors more on memory registers 3.
the Symbol Table (ST) that is used to store each label and its corresponding value. the literal table (LT) that is used to store each literal encountered and its corresponding assignment location. Pass 1 Data Structures 1. for each instruction and its length (two. 5. A table. that indicates the symbolic mnemonic. A table. . A table. A Location Counter (LC). Input source program 2. the Machine-operation Table (MOT). 7. 6.The second step in our design procedure is to establish the databases that we have to work with. the Pseudo-Operation Table (POT) that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1. A copy of the input to be used by pass 2. Pass 2 Data Structures 1. A table. used to keep track of each instruction·s location. 3. or six bytes) 4. Copy of source program input to pass1. four.
A table. A work space INST that is used to hold each instruction as its various parts are being assembled together. that indicates for each instruction. 5. symbolic mnemonic. used prior to actual outputting for converting the assembled instructions into the format needed by the loader. the base table (BT). prepared by pass1. 6. length (two. that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 2. Location Counter (LC) 3. PUNCH CARD. used to produce a printed listing. 7. containing each label and corresponding value. the Symbol Table (ST). A table. the Pseudo-Operation Table (POT). that indicates which registers are currently specified as base registers by USING pseudo-ops and what the specified contents of these registers are. 9. 4. 8. binary machine opcode and format of instruction. A work space.2. the Machine-operation Table (MOT). . PRINT LINE. A Table. A table. four. A work space. or six bytes).
length. pass 1 requires only name and length. Instead of using two different tables. 1. we construct single (MOT).3: Data structures of the assembler B) pass1 & pass2 Assembler flow chart The third step in our design procedure is to specify the format and content of each of the data structures. An output deck of assembled instructions in the format needed by the loader. The Machine operation table (MOT) and pseudo-operation table are example of fixed tables. The contents of these tables are not filled in or altered during the assembly process. binary code and format. Pass 2 requires a machine operation table (MOT) containing the name. Fig.10. .
¶b· represents ´blankµ Fig. 10 10 10 01 ««. 001 001 001 000 ««. 5A 4A 5E 1E ««.: The flowchart for Pass ² 1: .The following figure depicts the format of the machine-op table (MOT) ³³³³³³³³³³³³³² 6 bytes per entry ³³³³³ ³³³³³³² Not Instruction Instruction used length format here (3 bits) Mnemonic Binary Opcode Opcode (4bytes) (1byte) (2 bits) (3bits) characters (hexadecimal) (binary) (binary) ´Abbbµ ´Ahbbµ ´ALbbµ ´ALRBµ ««.
The location counter is always made to contain the address of the next memory word in the target program.g. e. For this purpose it must determine the addresses with which the symbol names used in a program are associated. It is .The primary function performed by the analysis phase is the building of the symbol table. To implement memory allocation a data structure called location counter (LC) is introduced. however others must be inferred. It is possible to determine some address directly. the address of the first instruction in the program.
analysis phase needs to know lengths of different instructions. To update the contents of LC. This ensure: that LC points to the next memory word in the target program even when machine instructions have different lengths and DS/DC statements reserve different amounts of memory. it enters the label and the contents of LC in a new entry of the symbol table. We refer to the processing involved in maintaining the location counter asLC processing . It then finds the number of memory words required by the assembly statement and updates. the LC contents. This information simply depends on the assembly language hence the mnemonics table can be extended to include this information in a new field called length. Whenever the analysis phase sees a label in an assembly statement.initialized to the constant.
: Pass2 flowchart .Flow chart for Pass ² 2 Fig.
The source program might have errors.4. the parser accepts a sequence of tokens and produces a parse tree. we will do very little error handling. Conceptually. Lexical analysis creates tokens from a sequence of input characters and it is these tokens that are processed by a parser to build a data structure such as parse tree or abstract syntax trees. Shamefully. A) Parsing B) Scanning C) Token Ans. In practice this might not occur. Define the following. A) Parsing Parsing transforms input text or string into a data structure. 1. . which is suitable for later processing and which captures the implied hierarchy of the input. usually a tree.
Universal 2. 1. The commonly used top-down and bottom parsers are not universal. there are (context-free) grammars that cannot be used with them. As expected. The LL and LR parsers are important in practice. The LR grammars form a larger class. Parsers for this class are usually constructed with the aid of automatic tools. Specifically. . we will not discuss them. whereas. bottom-up parsers start from the leaves and proceed upward.2. That is. Top-down 3. Real compilers produce (abstract) syntax trees not parse trees (concrete syntax trees). There are three classes for grammar-based parsers. top-down parsers start from the root of the tree and proceed downward. Hand written parsers are often LL. We don·t do this for the pedagogical reasons given previously. the predictive parsers we looked at in chapter two are for LL grammars. Bottom-up The universal parsers are not used in practice as they are inefficient.
which are then mapped into tokens. The phases are called lexical analysis or scanning. Each of these phases changes the representation of the program being compiled. For example. there are three phases of analysis with the output of one phase the input of the next. and Semantic Analysis. One phase output will go to the next phase as input. which transforms the program from a string of characters to a string of tokens. It is a translator. C) Token Syntax Analysis or Parsing. Compiler performs analysis for sentence generations and interpretations. the latter constituting the output of the lexical analyzer.B) Scanning Compiler is a program which converts the source program into machine level language. Conceptually. any one of the following C statements . transforms the program into some kind of syntax tree. The character stream input is grouped into meaningful units called lexemes. decorates the tree with semantic information.
x3 = y + 3 . would be grouped into the lexemes x3.. but not x 3 = y + 3. A token is a <token-name. and . =. y. x3 = y+ 3 . . The hierarchical decomposition above sentence is given figure Fig. +. A token is a <token-name. attribute-value> pair. 3.x3 = y + 3.attribute-value> pair.
for example. Describe the process of Bootstrapping in the context of Linkers Ans. so the obvious question is how is the first program loaded into the computer? In modern computers. as in "pulling one·s self up by the bootstraps. On x86 systems." When the CPU is powered on or reset. the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. On IBM-compatible x86 systems. or if that fails the first block of the first hard disk. The chain of programs being loaded by other programs has to start somewhere. The bootstrap ROM occupies the top 64K of the address space and ROM code then starts up the computer. into memory location zero and jumps to location zero. The discussions of loading up to this point have all presumed that there·s already an operating system or at least a program loader resident in the computer to load the program of interest.5. it sets its registers to a known state. the boot ROM code reads the first block of the floppy disk into memory. The program in block zero in turn loads a . the reset sequence jumps to the address 16 bytes below the top of the system·s address space.
) Why not just load the operating system directly? Because you can·t fit an operating system loader into 512 bytes. The first level loader typically is only able to load a single-segment program from a file with a fixed name in the top-level directory of the boot disk. e. only a few .g. (There can be even more steps. address large amounts of memory (on an x86 the loader usually runs in real mode which means that it·s tricky to address more than 1MB of memory. The kernel creates a process. but the sequence of increasingly capable loaders remains. The operating system loader contains more sophisticated code that can read and interpret a configuration file.) The full operating system can turn on the virtual memory system. Many Unix systems use a similar bootstrap process to get user-mode programs running. a boot manager that decides from which disk partition to read the operating system boot program. loads the drivers it needs.. uncompress a compressed operating system executable. and jumps to that program which in turn loads in the operating system and starts it. and then proceed to run user-level programs. then stuffs a tiny little program.slightly larger operating system boot program from a known place on the disk into memory.
into that process.dozen bytes long. Describe the procedure for design of a Linker. For example. Some systems make this quite easy (just stick the name of your program in AUTOEXEC. None of this matters much to the application level programmer. 6. the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs. for example). others make it nearly impossible. a single-application system could be built over a Unix kernel by naming the application /etc/init. The tiny program executes a system call that runs /etc/init. Relocation and segmented addressing linking requirements in . since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system.BAT and reboot Windows 95. Ans. It also presents opportunities for customized systems. but it becomes more interesting if you want to write programs that run on the bare hardware of the machine.
The relocation requirements of a program are influenced by the addressing structure of the computer system on which it is to execute. -----end----- . Use of the segmented addressing structure reduces the relocation requirements of program.