I.

INSTRUCTION SET ARCHITECTURE

The Instruction Set Architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. The ISA serves as the boundary between software and hardware. We will briefly describe the instruction sets found in many of the microprocessors used today. The ISA of a processor can be described using 5 categories: Operand Storage in the CPU Where are the operands kept other than in memory? Number of explicit named operands How many operands are named in a typical instruction? Operand location Can any ALU instruction operand be located in memory? Or must all operands be kept internally in the CPU? Operations What operations are provided in the ISA.? Type and size of operands What is the type and size of each operand and how is it specified? Of all the above the most distinguishing factor is the first.

The 3 most common types of ISAs are: 1. Stack - The operands are implicitly on top of the stack. 2. Accumulator - One operand is implicitly the accumulator. 3. General Purpose Register (GPR) - All operands are explicitly mentioned, they are either registers or memory locations.

it has 4 banks of GPRs but most instructions must have the A register as one of its operands.Lets look at the assembly code of A = B + C. Disadvantages: The accumulator is only temporary storage so memory traffic is the highest for this approach. Data can be stored for long periods in registers. Earlier CPUs were of the first 2 types but in the last 15 years all CPUs made are GPR processors. Disadvantages: All operands must be named leading to longer instructions. This makes it hard to generate efficient code. Accumulator Advantages: Short instructions. Short instructions.C - Not all processors can be neatly tagged into one of the above categories. What are the advantages and disadvantages of each of these approaches? Stack Advantages: Simple Model of expression evaluation (reverse polish). The i8051 is another example. Reduced Instruction Set Computer (RISC) . The i8086 has many instructions that use implicit operands although it has a general register set.B STORE R1. in all 3 architectures: Stack PUSH A PUSH B ADD POP C Accumulator LOAD A ADD B STORE C GPR LOAD R1. The other reason is that registers are easier for a compiler to use. The 2 major reasons are that registers are faster than memory. the more data that can be kept internally in the CPU the faster the program will run.A ADD R1. Disadvantages: A stack can't be randomly accessed. GPR Advantages: Makes code generation easy. The stack itself is accessed every operation and becomes a bottleneck.

A few examples of such CPUs are the IBM 360. The ISA is composed of instructions that all have exactly the same size. Because the number of cycles it takes to access memory varies so does the whole instruction. But while these CPUS were clearly better than previous stack and accumulator based CPUs they were still lacking in several areas: 1. pipelining and multiple issue. Thus they can be pre-fetched and pipelined succesfuly. II. PUSH). Most ALU instruction had only 2 operands where one of the operands is also the destination. The number of registers in RISC is usualy 32 or more. Thus A = B + C will be assembled as: LOAD R1. now that memory access is restricted there aren't several kinds of MOV or ADD instructions. The only disadvantage of RISC is its code size. RISC architectures are also called LOAD/STORE architectures.R2 STORE C.As we mentioned before most modern CPUs are of the GPR (General Purpose Register) type. Thus the older architecture is called CISC (Complete Instruction Set Computer). This isn't good for compiler writers. Instructions were of varying length from 1 byte to 6-8 bytes. In fact.A LOAD R2. Thus in the early 80's the idea of RISC was introduced. LEVELS OF PROGRAMMING LANGUAGE . The only memory access is through explicit LOAD/STORE instructions.R3 Although it takes 4 instructions we can reuse the values in the registers. Why is this architecture called RISC? What is Reduced about it? The answer is that to make all instructions the same length the number of bits that are used for the opcode is reduced. Usualy more instructions are needed and there is a waste in short instructions (POP. 3. usualy 32 bits. RISC stands for Reduced Instruction Set Computer.R1. The instructions that were thrown out are the less important string and BCD (binary-coded decimal) operations. All ALU instructions have 3 operands which are only registers. This causes problems with the pre-fetching and pipelining of instructions. The first RISC CPU the MIPS 2000 has 32 GPRs as opposed to 16 in the 68xxx architecture and 8 in the 80x86 architecture. DEC VAX.B ADD R3. This means this operand is destroyed during the operation or it must be saved before somewhere. ALU (Arithmetic Logical Unit) instructions could have operands that were memory locations. 2. Thus less instructions are provided. Intel 80x86 and Motorola 68xxx. The SPARC project was started at Berkeley and the MIPS project at Stanford.

In this context. Thus. a program written in a low-level language can be extremely efficient.There is only one programming language that any computer can actually understand and execute: its own native binary machine code. However. consisting of strings of 1's and 0's and stored as binary numbers. . In addition to the distinction between high-level and low-level languages. as well as a clear understanding of the inner workings of the processor itself. The main problems with using machine code directly are that it is very easy to make a mistake. regardless of the language. to write a lowlevel program takes a substantial amount of time. a low-level language corresponds closely to machine code. making optimum use of both computer memory and processing time. All other languages are said to be high level or low level according to how closely they can be said to resemble machine code. Thus. there is a further distinction between compiler languages and interpreter languages. low-level programming is typically used only for very small programs. and very hard to find it once you realize the mistake has been made. so that a single lowlevel language instruction translates to a single machine-language instruction. A high-level language instruction typically translates into a series of machine-language instructions. High-level languages permit faster development of large programs. but the savings in programmer time generally far outweigh the inefficiencies of the finished product. a high-level language where each line of code translates to 10 machine instructions costs only one tenth as much in program development as a low-level language where each line of code represents only a single machine instruction. The final program as executed by the computer is not as efficient. This is the lowest possible level of language in which it is possible to write a computer program. Therefore. Let's take a look at the various levels. Low-level languages have the advantage that they can be written to take advantage of any peculiarities in the architecture of the central processing unit (CPU) which is the "brain" of any computer. Absolute Machine Code The very lowest possible level at which you can program a computer is in its own native machine code. This is because the cost of writing a program is nearly constant for each line of code. or for segments of code that are highly critical and must run as efficiently as possible.

To offset this drawback. The assembly-language program must be translated into machine code by a separate program called an assembler. the resulting machine code is saved separately. the computer still cannot understand it. Typically. The assembler program recognizes the character strings that make up the symbolic names of the various machine operations. it is translated to the equivalent machine code by a program called a compiler. Compiler Language Compiler languages are the high-level equivalent of assembly language. This means that it runs a bit more slowly and uses a bit more memory than the equivalent assembled program. which also allows symbolic designation of memory locations. To help distinguish between the "before" and "after" versions of the program.Assembly Language Assembly language is nothing more than a symbolic representation of machine code. and can be run on its own at any time. and substitutes those addresses for the names. updating or correcting a compiled program requires that the original (source) program be modified appropriately and then recompiled to form a new machine-language (object) program. it is necessary to make the changes to the source code and then re-assemble it to create a new object program. Each instruction in the compiler language can correspond to many machine instructions. At the same time. an instruction to add the contents of a memory location to an internal CPU register called the accumulator might be add a number instead of a string of binary digits (bits). The final result is a machinelanguage program that can run on its own at any time. If an assembly-language program needs to be changed or corrected. the compiled machine code is less efficient than the code produced when using assembly language. . the assembler and the assemblylanguage program are no longer needed. the original assembly-language program is also known as the source code. Once the program has been compiled. however. and substitutes the required machine code for each instruction. No matter how close assembly language is to machine code. while the final machine-language program is designated the object code. it also calculates the required address in memory for each symbolic name of a memory location. Once the program has been written. As with assembly-language programs. we also have the fact that it takes much less time to develop a compiler-language program. so it can be ready to go sooner than the assembly-language program. Thus.

it operates in a totally different manner from a compiler language. because the interpreter has to scan the user's program one line at a time and execute internal portions of itself in response. is considered to be high level. This can enormously speed up the development and testing process. This use of an interpreter program to directly execute the user's program has both advantages and disadvantages. III. In addition. like a compiler language. execution of an interpreted program is much slower than for a compiled program.Interpreter Language An interpreter language. . this arrangement requires that both the interpreter and the user's program reside in memory at the same time. There is no need to recompile because no new machine code is ever produced. The primary advantage is that you can run the program to test its operation. the interpreter program resides in memory. and directly executes the high-level program without preliminary translation to machine code. and run it again directly. However. Rather. make a few changes. On the down side.