You are on page 1of 4

1 CS 431, Assignment 1, Assembler This is a programming assignment. Your task is to write an assembler for MISC.

This assembler will not be complete. It will have to be functional from beginning to end, but it only needs to include the logic to handle the specific instructions that appear in the example test program. It would be long and repetitive to include all of the possible combinations of instructions and operands. If necessary, making the assembler more complete may be part of a future assignment. In other words, you need to write a program that will read an assembly language source file and produce a machine language file that will run on the MISC simulation. MISC information and source files are posted on the Web page. Information on the MISC assembly language is given in the document miscinfoassembly.doc. That document should provide the background necessary in order to understand the functionality of the assembler, while this document provides some implementation details. You may break your code into as many classes as necessary. When handing in your assignment, provide a brief readme file in case the names of the files do not make it obvious how to compile and run your code. Your program class or classes will be black box tested on the sumtenV1asm.txt file. These are the possible outcomes of the assignment: 1. Your code does not compile correctly: credit. 2. Your code compiles, but it has run time errors or some other run time problem that either results in no output or output that is obviously incorrect: credit. 3. Your code produces what superficially appears to be a correctly formatted machine language output file, but when this file is run on the MISC simulation, it does not give the right result: credit. 4. Your code produces an output file which gives the correct result when run on MISC: full credit. Here is some additional information on the scope of this assignment: 1. An assembler is not a compiler; it is a precursor to a compiler. Some of the basic ideas involved in an assembler are related to the logic of the first phase of a compiler. That means that when it comes time to write a compiler some of the problems have already been solved and the logic of the assembler can be recycled when starting to write the compiler. 2. In addition to being a precursor, as the book states, an assembler may be part of the context of a compiler. In other words, if a compiler is designed to produce assembly language code as output rather than machine language code, then an assembler is necessary in order to get code runnable on the hardware. This is the model that will be followed in

2 the assignments for this class. We will ultimately be concerned with a compiler that produces assembly language. A working assembler will be necessary in order to test its output. 3. The basic theoretical difference between an assembler and a compiler can be explained as follows: Assembly language code does not have a hierarchical structure. That means that the assembler does not have to do syntax analysis, or parsing. Its functionality is linear and is restricted to lexical analysis, or scanning. It is helpful to be able to work on lexical analysis, which is simpler, before tackling syntactical analysis. 4. An assembler has to deal with the concept of identifiers and management of identifiers using a symbol table. It is helpful to get a clear understanding of the need for these concepts, which also come up in a compiler, before taking on the additional problems that come with implementing the logic of a compiler. Because your code will be black box tested, how you do the implementation is not of concern to me. I give some ideas about implementation below based on my experience solving the problem. You may or may not find these ideas useful. You are not obligated to make use of them. 1. I wrote my code in Java and you may also be making use of an object oriented language. However, the logic of the assembler was not object oriented. In principle the assembler could be written as one long, complicated main() method based on loops and ifs. It is not immediately clear how to decompose the problem into classes, but from the structured programming point of view it is clear how to decompose it into a set of modules. In other words, the overall problem can be broken into manageable sized pieces by implementing functionality in modules. You may not have considered how to write structured code in Java. The approach is pretty simple: Modules can be implemented as static methods in the program class along with the main() method. 2. In general, an assembler is based on one or more loops that pass through the source code doing various things. The MISC assembly language requires that all variables be declared up front in the data segment. The example program does not contain any ifs or other statements that would cause a forward reference. These two facts together mean that it is possible to write a one pass assembler. You are welcome to do so. More details are given in point 4. 3. Here is some information on how I dealt with the symbol table. I implemented it as an instance of the Java class Hashtable. This allows you to enter a key and a value and retrieve the value later based on the key, using the methods put() and get(). A key has to be a reference to an object of a class which implements both the hashCode() and the equals() methods. The String class satisfies this requirement, so I used keys consisting of Strings containing the identifiers for variables and labels. The values cant be simple types; they also have to be references. I used instances of the MachineByteV1 class as the values. This was convenient because that class has methods for converting to and from integer and String representations, copyIntToByte() and getStringFromByte(). The current line

3 number is entered as the value; this is the address of the key. In order to save the trouble of continually passing the symbol table and line number, I made them static variables in main(). Incidentally, it was also helpful to write another static method that could convert from hex in order to deal with numeric constants. 4. Here is a brief summary of the structure of the assembler as I implemented it. There is a loop that takes in input and does output. Within the loop there would be a series of if statements to call the translation modules. Every line of assembly language code begins with a directive, an instruction, or a variable name. These are the three overall categories for the submodules. Along the way the line number has to be maintained and various operations affect the symbol table. main() Loop, reading characters and forming the units of the lines of assembly code. Call the translation modules in an if statement based on the contents of the first unit. Write the translation out based on the module return values. Increment the line count. handleData() keep the line count correct. handleCode() Fill blank lines if there were < 8 variables. Keep the line count correct. handleLabel() Make an entry in the symbol table. Keep the line count correct. handleEnd() Supply the string of *s at the end. Return line of code. handleMove() Deal with two cases: MOVE REG CONST MOVE REG MEM Return line of code.

(CONST hex conversion) (MEM symbol table access)

handleAdd() Deal with two cases: ADD REG CONST ADD MEM REG Return line of code. handleSub() Deal with: SUB REG REG Return line of code.

(CONST hex conversion) (MEM symbol table access)

handleJpos() Deal with the operand.( symbol table access) Return line of code. handleVar() Update symbol table. Return line of code. hexStringToInt() Return converted value. ( hex conversion)