Computer Science and Engineering Arizona State University Tempe, Arizona 85287-5406
Abstract form requires fewer bytes in memory than the long
An assembly-language program often has more than form requires. Besides consuming less memory, the one valid machine-language encoding. Furthermore, short form usually executes faster than the long form. a program’s different encodings usually have different Programmers and compilers can specify the form an time and space requirements at run time. This pa- assembler should use for each multiple-form instruc- per examines the problem of translating an assembly- tion in a program. In general, however, programmers language program into an optimum machine-language and compilers cannot accurately determine the short- encoding. The optimization problem is NP-complete, est possible forms without actually assembling the pro- so a polynomial-time algorithm that always finds an gram; ideally, assemblers should decide which forms to optimum encoding is unlikely [4]. Some suboptimum use. algorithms are available; however, they place signif- When assembling a multiple-form instruction, an icant restrictions on assembly-language programmers assembler can generate a particular form of the in- and compilers, and some of the algorithms can pro- struction if and only if the instruction has an operand duce invalid machine-language encodings of programs. with an expression whose value is within a certain This paper discusses a new optimization algorithm range. If the expression is not within the proper range, that (1) always generates correct encodings, (2) does the assembler must generate a longer form of the in- not place any restrictions on programmers and com- struction (or report an error if there is not a longer pilers, and (3) eliminates the need for programmers form). Since the longest form of an instruction is more to optimize machine-language encodings of programs general than the shorter forms, an assembler could manually. simply always choose the longest forms for instruc- tions. To ensure the shortest encoding of a program, however, an assembler must choose the shortest pos- 1 Introduction sible form for each multiple-form instruction in the program. Assemblers translate assembly-language programs into machine code. A programmer or a compiler pro- The optimization problem is difficult because in- duces an assembly-language program, and the assem- structions can depend on each other. The expression bler translates the program into absolute machine code that determines which forms are possible for a par- or relocatable machine code. Assemblers are therefore ticular instruction can depend on the forms that the important, although often invisible, tools even for ap- assembler selects for other instructions in the program. plications that use high-level languages. An assembler may actually be part of a compiler. 1.2 Contributions and Limitations
1.1 Optimization Previous work in the area of optimizing assem-
blers ignores some requirements that are important Most assembly-language programs have more than for a general-purpose optimizing assembler. This pa- one possible machine-language encoding. Multiple en- per specifies these important requirements. codings are possible because some assembly-language Besides specifying the requirements, this paper de- instructions are multiple-form instructions: they have scribes a new algorithm that meets those require- two or more machine-language forms. For example, ments. The algorithm does not place any restrictions many computers have a branch instruction that has on assembly-language programs that people and com- both a short form and a long form, where the short pilers generate. 2 The Optimization Problem L1 BRA * If the assembler uses the $$ IF *-L1.EQ.2 short form for the first The examples in this paper use a subset of the as- BRA * BRA, the IF expression is sembly language that ASMK, an M68000 assembler BRA * true and the program as- at Arizona State University, accepts for the Motorola $$ ENDIF sembles into three words M68000 processor. In particular, we use an EQU state- END of machine code. If the ment that assigns a value to a label, IF and ENDIF assembler uses the long statements that provide support for conditional assem- form for the first BRA, bly, and a BRA instruction that has a one-word form the IF expression is false and a two-word form. and the program assem- The IF and ENDIF statements delimit conditional- bles into two words. assembly blocks. The assembler assembles the state- ments in a conditional-assembly block only if the ex- Figure 1: Greedy Algorithms Don’t Always Optimize pression for the IF statement is true. If the value of a BRA instruction’s expression is in ORG $FFFF7FFC Should an assembler se- the range *+2-128 through * or the range *+4 through ADD.L #2, D0 lect ADDQ or ADDI? If the *+2+126 (where * is the address of the instruction), an L1 LEA.L L1, A0 assembler selects ADDQ, assembler can translate the BRA statement into a one- LEA.L L1, A1 the program requires 20 word machine-language instruction; otherwise, the as- LEA.L L1, A2 bytes in memory (2 for sembler must translate the BRA statement into a two- END ADDQ.L and 6 for each word machine-language instruction. An assembler can LEA.L). If the assembler choose to generate the long form even when the short selects ADDI, the program form would work. requires only 18 bytes in memory (6 for ADDI.L 2.1 Observations and 4 for each LEA.L). The optimization problem in general does not in- Figure 2: A Greedy Algorithm Fails Again clude just branch instructions. Most computers have several types of multiple-form instructions, and an op- timizing assembler should optimize all of them. (at best), so most programmers do not make them. Some computers have multiple-form instructions Besides, a programmer never needs to assume that an with more than two forms, so a general-purpose op- assembler will generate a particular form for an in- timization algorithm must be able to handle instruc- struction because a programmer can always force the tions that have three or more forms. assembler to generate any form that she or he wants. A greedy algorithm that optimizes each instruc- (With an M68000 assembler, a programmer can use tion individually will not always generate the short- BRA.B to force the short form of a branch instruction, est possible program because optimizing a particular or BRA.W to force the long form.) instruction might make a program longer than not op- timizing the instruction. Figure 1 provides an admit- 2.2 Requirements for an Assembler tedly contrived example that, in spite of being con- trived, illustrates the challenge that conditional as- A general-purpose optimizing assembler for both sembly presents. Figure 2 provides a more realistic programmers and compilers must meet the following example that does not use conditional assembly. four requirements: Figure 3 shows that a multi-pass optimizing assem- 1. The assembler must always generate a correct bler must be careful to guarantee that it always termi- machine-language encoding of an error-free assembly- nates. We use conditional assembly in this contrived language program; the encoding might not be optimal, example to illustrate the point simply, but similar cir- but it must be correct. For a program that contains cumstances arise in real programs, even without condi- assembly errors, the assembler must report the errors. tional assembly. The assembler must choose the long 2. The assembler must not place any restrictions form of the BRA instruction in Figure 3 to avoid an on the contents of assembly-language programs. Pro- infinite loop. grammers and compilers must be free to generate any Optimization can adversely affect the behavior of assembly-language program. The assembler must not programs that make assumptions about the sizes of in- refuse to assemble a program (or, even worse, gener- structions (Figure 4). Such assumptions are bad style ate incorrect machine code without reporting an error) L1 BRA L2 If an assembler selects the space, both of which are entirely unacceptable. $$ IF *-L1.EQ.2 short form for the BRA, the Szymanski [4] describes an algorithm from an old L2 EQU L1+2+128 destination of the branch Unix assembler that selects long forms first and then $$ ENDIF is too far away for a short makes one extra pass to shorten long branches where $$ IF *-L1.EQ.4 branch. If the assem- possible. This algorithm typically misses many possi- L2 EQU L1 bler selects the long form ble optimizations since it does not consider optimiza- $$ ENDIF for the BRA, the destina- tions that become possible only as a result of other END tion of the branch is close (subsequent) optimizations. enough for the short form Szymanski also describes an algorithm that selects to work. A multi-pass op- short forms first and later changes them to long forms. timizing assembler might He restricts expressions to the form label ± constant select the short form dur- where label specifies the address of an instruction, and ing pass one, the long form his algorithm is O(n2 ) in time and O(n) in space. during pass two, the short The existing algorithms that optimize forward form during pass three, branches are adequate for the back ends of some com- etc. pilers, but they are not good enough for use in general- purpose assemblers. The traditional algorithm, which Figure 3: An Assembly Might Never Terminate optimizes only backward references, is the only algo- rithm that does not place restrictions on the programs L1 BRA L2 A non-optimizing assembler that people and compilers can generate. For example, L2 EQU L1+4 might use the long form for the an assembler that provides conditional-assembly state- BRA *+500 first BRA instruction because it ments could not use any of the existing algorithms END makes a forward reference; how- (except for the traditional algorithm) without limit- ever, an optimizing assembler ing programmers and compilers in some way. Most would use the short form. general-purpose assemblers do not optimize forward Figure 4: Optimization Can Change a Program references, probably because the available optimiza- tion algorithms are not good enough.
simply because the program is hard to optimize.
3. The assembler must select the smallest possi- 4 New Optimizing Algorithm ble form for most multiple-form instructions in most This section describes a new polynomial-time algo- programs. Unfortunately, this requirement is not very rithm that optimizes many practical programs. The specific. The goal is to require an acceptable amount of algorithm does not always generate optimum machine optimization without demanding complete optimiza- code, but it always generates correct machine code. tion of all instructions in all programs because such The algorithm meets all the requirements from Sec- a demand would necessitate an exponential algorithm tion 2.2. (assuming P 6= NP). We have implemented this algorithm in ASMK, a 4. The assembler must work with all of the com- full-featured assembler for the M68000 [3]. ASMK puter’s multiple-form instructions, not just branch in- makes one more pass over a source program than the structions; also, the assembler must take advantage of algorithm requires; during the extra pass, ASMK gen- any multiple-form instructions that have more than erates a table of contents and creates some tables for two forms. the linker. The algorithm takes a greedy approach and opti- 3 Previous Work mizes each instruction individually. The greedy ap- proach is optimal for most programs; however, Sec- Traditional one-pass and two-pass assemblers op- tion 2.1 shows that the greedy approach is not per- timize instructions only when they don’t include fect because optimizing some instructions might make any forward references. Richards [2] describes an a program longer than not optimizing those instruc- exhaustive-search algorithm for optimizing forward- tions. reference instructions that have two forms. Frieder The algorithm makes multiple passes over a source and Saal [1] extend Richards’ work, but their algo- program. During the first pass, the algorithm uses rithm is still O(2n ) in execution time and O(n2 ) in the shortest forms for multiple-form instructions that make forward references. seven passes in the worst case ever encountered among During pass i, where 2 ≤ i ≤ N − 1, and N is the real programs (i.e., excluding contrived cases), so the total number of passes, the algorithm uses a longer algorithm’s execution time is O(n) in practice. The form of an instruction (that makes forward references) assembler has neglected to optimize only one state- if label values from pass i − 1 (and earlier parts of ment out of more than a million lines of code in real pass i) indicate that the instruction must be longer; programs that we have assembled. For practical pur- also, to ensure termination, the algorithm never selects poses, therefore, the algorithm optimizes programs a shorter form of an instruction if the previous pass completely. selected a long form. The algorithm continues to make passes until the algorithm makes identical decisions 5 Conclusion during two consecutive passes. For an assembly that requires more than four passes, for 3 ≤ i ≤ N − 2, Optimizing multiple-form instructions in assembly- pass i lengthens instructions that must be longer due language programs is a hard problem. Most optimiz- to instructions that pass i − 1 lengthened. Pass N − 1 ing compilers optimize multiple-form branch instruc- is the first pass during which the algorithm does not tions; however, the available optimization algorithms lengthen any instructions. The label values that pass are not adequate for general-purpose assemblers that N − 1 finds are the final values for all of the labels. assembly-language programmers and compilers use. Pass N (the final pass) makes the same decisions This paper described a new optimization algo- that pass N − 1 made. The purpose of the final pass rithm that always generates correct encodings of pro- is to generate machine code. The algorithm does not grams without placing any restrictions on program- generate machine code during pass N − 1 because the mers and compilers. The new algorithm is suitable for algorithm does not know the value of N until after use in general-purpose assemblers for both program- pass N − 1 completes. To avoid pass N , the algorithm mers and compilers. The algorithm always terminates could generate machine code during pass two, and then with a correct machine-language encoding of an input again during each subsequent pass; however, generat- assembly-language program. Also, an implementation ing machine code during multiple passes is probably of the algorithm in an existing assembler helped verify less efficient than making the extra pass. that the algorithm does a good job of optimization in During each pass, the algorithm keeps track of practice. GROWTH, the number of bytes by which the machine- The new optimization algorithm is significant to language encoding of the program has grown during assembly-language programmers because it eliminates the pass. The algorithm uses GROWTH to anticipate the drudgery of manually optimizing time-critical growth in the values of labels that are forward ref- assembly-language programs. erences. Anticipating growth improves optimization. The new algorithm is significant to compiler writers Consider, for example, a branch instruction that con- because it gives code generators the freedom to gen- tains a forward reference. Since many instructions erate any assembly-language programs without having may grow from a short form to a long form from one to worry about the restrictions that other optimization pass to the next, the value of the label during one pass algorithms place on assembly-language programs. can be much greater than the value during the previ- ous pass; if the algorithm does not anticipate growth References in the value of the label, the label value could appear to require a large negative offset, even though the true [1] G. Frieder and H. J. Saal, “A Process for the De- label value might require a small positive offset. termination of Addresses in Variable Length Ad- dressing,” Comm. ACM , 19:335-338, 1976. We cannot describe the entire algorithm in detail here. Sterbenz [3] explains many details that we have [2] D. L. Richards, “How to Keep the Addresses omitted, including extensions that are necessary for Short,” Comm. ACM , 14:346-349, 1971. relocatable assembly. He also provides a proof of ter- mination and a proof that the algorithm always gener- [3] Rudolph J. Sterbenz, “Optimizing Assembler,” ates a correct encoding. Additionally, he analyzes the Thesis, Arizona State University, Tempe, Arizona, algorithm’s performance and proves it to be O(n2 ) in 1992. time and O(n) in space. [4] T. G. Szymanski, “Assembling Code for Ma- In actual practice, the algorithm optimizes real pro- chines with Span-Dependent Instructions,” Comm. grams with only five passes in typical cases and with ACM , 21:300-308, 1978.