You are on page 1of 4

Optimizing Assembler

David C. Pheanis and Rudolph Joseph Sterbenz


Computer Science and Engineering
Arizona State University
Tempe, Arizona 85287-5406

Abstract form requires fewer bytes in memory than the long


An assembly-language program often has more than form requires. Besides consuming less memory, the
one valid machine-language encoding. Furthermore, short form usually executes faster than the long form.
a program’s different encodings usually have different Programmers and compilers can specify the form an
time and space requirements at run time. This pa- assembler should use for each multiple-form instruc-
per examines the problem of translating an assembly- tion in a program. In general, however, programmers
language program into an optimum machine-language and compilers cannot accurately determine the short-
encoding. The optimization problem is NP-complete, est possible forms without actually assembling the pro-
so a polynomial-time algorithm that always finds an gram; ideally, assemblers should decide which forms to
optimum encoding is unlikely [4]. Some suboptimum use.
algorithms are available; however, they place signif- When assembling a multiple-form instruction, an
icant restrictions on assembly-language programmers assembler can generate a particular form of the in-
and compilers, and some of the algorithms can pro- struction if and only if the instruction has an operand
duce invalid machine-language encodings of programs. with an expression whose value is within a certain
This paper discusses a new optimization algorithm range. If the expression is not within the proper range,
that (1) always generates correct encodings, (2) does the assembler must generate a longer form of the in-
not place any restrictions on programmers and com- struction (or report an error if there is not a longer
pilers, and (3) eliminates the need for programmers form). Since the longest form of an instruction is more
to optimize machine-language encodings of programs general than the shorter forms, an assembler could
manually. simply always choose the longest forms for instruc-
tions. To ensure the shortest encoding of a program,
however, an assembler must choose the shortest pos-
1 Introduction sible form for each multiple-form instruction in the
program.
Assemblers translate assembly-language programs
into machine code. A programmer or a compiler pro- The optimization problem is difficult because in-
duces an assembly-language program, and the assem- structions can depend on each other. The expression
bler translates the program into absolute machine code that determines which forms are possible for a par-
or relocatable machine code. Assemblers are therefore ticular instruction can depend on the forms that the
important, although often invisible, tools even for ap- assembler selects for other instructions in the program.
plications that use high-level languages. An assembler
may actually be part of a compiler. 1.2 Contributions and Limitations

1.1 Optimization Previous work in the area of optimizing assem-


blers ignores some requirements that are important
Most assembly-language programs have more than for a general-purpose optimizing assembler. This pa-
one possible machine-language encoding. Multiple en- per specifies these important requirements.
codings are possible because some assembly-language Besides specifying the requirements, this paper de-
instructions are multiple-form instructions: they have scribes a new algorithm that meets those require-
two or more machine-language forms. For example, ments. The algorithm does not place any restrictions
many computers have a branch instruction that has on assembly-language programs that people and com-
both a short form and a long form, where the short pilers generate.
2 The Optimization Problem L1 BRA * If the assembler uses the
$$ IF *-L1.EQ.2 short form for the first
The examples in this paper use a subset of the as- BRA * BRA, the IF expression is
sembly language that ASMK, an M68000 assembler BRA * true and the program as-
at Arizona State University, accepts for the Motorola $$ ENDIF sembles into three words
M68000 processor. In particular, we use an EQU state- END of machine code. If the
ment that assigns a value to a label, IF and ENDIF assembler uses the long
statements that provide support for conditional assem- form for the first BRA,
bly, and a BRA instruction that has a one-word form the IF expression is false
and a two-word form. and the program assem-
The IF and ENDIF statements delimit conditional- bles into two words.
assembly blocks. The assembler assembles the state-
ments in a conditional-assembly block only if the ex- Figure 1: Greedy Algorithms Don’t Always Optimize
pression for the IF statement is true.
If the value of a BRA instruction’s expression is in ORG $FFFF7FFC Should an assembler se-
the range *+2-128 through * or the range *+4 through ADD.L #2, D0 lect ADDQ or ADDI? If the
*+2+126 (where * is the address of the instruction), an L1 LEA.L L1, A0 assembler selects ADDQ,
assembler can translate the BRA statement into a one- LEA.L L1, A1 the program requires 20
word machine-language instruction; otherwise, the as- LEA.L L1, A2 bytes in memory (2 for
sembler must translate the BRA statement into a two- END ADDQ.L and 6 for each
word machine-language instruction. An assembler can LEA.L). If the assembler
choose to generate the long form even when the short selects ADDI, the program
form would work. requires only 18 bytes in
memory (6 for ADDI.L
2.1 Observations and 4 for each LEA.L).
The optimization problem in general does not in-
Figure 2: A Greedy Algorithm Fails Again
clude just branch instructions. Most computers have
several types of multiple-form instructions, and an op-
timizing assembler should optimize all of them. (at best), so most programmers do not make them.
Some computers have multiple-form instructions Besides, a programmer never needs to assume that an
with more than two forms, so a general-purpose op- assembler will generate a particular form for an in-
timization algorithm must be able to handle instruc- struction because a programmer can always force the
tions that have three or more forms. assembler to generate any form that she or he wants.
A greedy algorithm that optimizes each instruc- (With an M68000 assembler, a programmer can use
tion individually will not always generate the short- BRA.B to force the short form of a branch instruction,
est possible program because optimizing a particular or BRA.W to force the long form.)
instruction might make a program longer than not op-
timizing the instruction. Figure 1 provides an admit- 2.2 Requirements for an Assembler
tedly contrived example that, in spite of being con-
trived, illustrates the challenge that conditional as- A general-purpose optimizing assembler for both
sembly presents. Figure 2 provides a more realistic programmers and compilers must meet the following
example that does not use conditional assembly. four requirements:
Figure 3 shows that a multi-pass optimizing assem- 1. The assembler must always generate a correct
bler must be careful to guarantee that it always termi- machine-language encoding of an error-free assembly-
nates. We use conditional assembly in this contrived language program; the encoding might not be optimal,
example to illustrate the point simply, but similar cir- but it must be correct. For a program that contains
cumstances arise in real programs, even without condi- assembly errors, the assembler must report the errors.
tional assembly. The assembler must choose the long 2. The assembler must not place any restrictions
form of the BRA instruction in Figure 3 to avoid an on the contents of assembly-language programs. Pro-
infinite loop. grammers and compilers must be free to generate any
Optimization can adversely affect the behavior of assembly-language program. The assembler must not
programs that make assumptions about the sizes of in- refuse to assemble a program (or, even worse, gener-
structions (Figure 4). Such assumptions are bad style ate incorrect machine code without reporting an error)
L1 BRA L2 If an assembler selects the space, both of which are entirely unacceptable.
$$ IF *-L1.EQ.2 short form for the BRA, the Szymanski [4] describes an algorithm from an old
L2 EQU L1+2+128 destination of the branch Unix assembler that selects long forms first and then
$$ ENDIF is too far away for a short makes one extra pass to shorten long branches where
$$ IF *-L1.EQ.4 branch. If the assem- possible. This algorithm typically misses many possi-
L2 EQU L1 bler selects the long form ble optimizations since it does not consider optimiza-
$$ ENDIF for the BRA, the destina- tions that become possible only as a result of other
END tion of the branch is close (subsequent) optimizations.
enough for the short form Szymanski also describes an algorithm that selects
to work. A multi-pass op- short forms first and later changes them to long forms.
timizing assembler might He restricts expressions to the form label ± constant
select the short form dur- where label specifies the address of an instruction, and
ing pass one, the long form his algorithm is O(n2 ) in time and O(n) in space.
during pass two, the short The existing algorithms that optimize forward
form during pass three, branches are adequate for the back ends of some com-
etc. pilers, but they are not good enough for use in general-
purpose assemblers. The traditional algorithm, which
Figure 3: An Assembly Might Never Terminate
optimizes only backward references, is the only algo-
rithm that does not place restrictions on the programs
L1 BRA L2 A non-optimizing assembler
that people and compilers can generate. For example,
L2 EQU L1+4 might use the long form for the
an assembler that provides conditional-assembly state-
BRA *+500 first BRA instruction because it
ments could not use any of the existing algorithms
END makes a forward reference; how-
(except for the traditional algorithm) without limit-
ever, an optimizing assembler
ing programmers and compilers in some way. Most
would use the short form.
general-purpose assemblers do not optimize forward
Figure 4: Optimization Can Change a Program references, probably because the available optimiza-
tion algorithms are not good enough.

simply because the program is hard to optimize.


3. The assembler must select the smallest possi- 4 New Optimizing Algorithm
ble form for most multiple-form instructions in most
This section describes a new polynomial-time algo-
programs. Unfortunately, this requirement is not very
rithm that optimizes many practical programs. The
specific. The goal is to require an acceptable amount of
algorithm does not always generate optimum machine
optimization without demanding complete optimiza-
code, but it always generates correct machine code.
tion of all instructions in all programs because such
The algorithm meets all the requirements from Sec-
a demand would necessitate an exponential algorithm
tion 2.2.
(assuming P 6= NP).
We have implemented this algorithm in ASMK, a
4. The assembler must work with all of the com-
full-featured assembler for the M68000 [3]. ASMK
puter’s multiple-form instructions, not just branch in-
makes one more pass over a source program than the
structions; also, the assembler must take advantage of
algorithm requires; during the extra pass, ASMK gen-
any multiple-form instructions that have more than
erates a table of contents and creates some tables for
two forms.
the linker.
The algorithm takes a greedy approach and opti-
3 Previous Work mizes each instruction individually. The greedy ap-
proach is optimal for most programs; however, Sec-
Traditional one-pass and two-pass assemblers op- tion 2.1 shows that the greedy approach is not per-
timize instructions only when they don’t include fect because optimizing some instructions might make
any forward references. Richards [2] describes an a program longer than not optimizing those instruc-
exhaustive-search algorithm for optimizing forward- tions.
reference instructions that have two forms. Frieder The algorithm makes multiple passes over a source
and Saal [1] extend Richards’ work, but their algo- program. During the first pass, the algorithm uses
rithm is still O(2n ) in execution time and O(n2 ) in the shortest forms for multiple-form instructions that
make forward references. seven passes in the worst case ever encountered among
During pass i, where 2 ≤ i ≤ N − 1, and N is the real programs (i.e., excluding contrived cases), so the
total number of passes, the algorithm uses a longer algorithm’s execution time is O(n) in practice. The
form of an instruction (that makes forward references) assembler has neglected to optimize only one state-
if label values from pass i − 1 (and earlier parts of ment out of more than a million lines of code in real
pass i) indicate that the instruction must be longer; programs that we have assembled. For practical pur-
also, to ensure termination, the algorithm never selects poses, therefore, the algorithm optimizes programs
a shorter form of an instruction if the previous pass completely.
selected a long form. The algorithm continues to make
passes until the algorithm makes identical decisions 5 Conclusion
during two consecutive passes. For an assembly that
requires more than four passes, for 3 ≤ i ≤ N − 2, Optimizing multiple-form instructions in assembly-
pass i lengthens instructions that must be longer due language programs is a hard problem. Most optimiz-
to instructions that pass i − 1 lengthened. Pass N − 1 ing compilers optimize multiple-form branch instruc-
is the first pass during which the algorithm does not tions; however, the available optimization algorithms
lengthen any instructions. The label values that pass are not adequate for general-purpose assemblers that
N − 1 finds are the final values for all of the labels. assembly-language programmers and compilers use.
Pass N (the final pass) makes the same decisions This paper described a new optimization algo-
that pass N − 1 made. The purpose of the final pass rithm that always generates correct encodings of pro-
is to generate machine code. The algorithm does not grams without placing any restrictions on program-
generate machine code during pass N − 1 because the mers and compilers. The new algorithm is suitable for
algorithm does not know the value of N until after use in general-purpose assemblers for both program-
pass N − 1 completes. To avoid pass N , the algorithm mers and compilers. The algorithm always terminates
could generate machine code during pass two, and then with a correct machine-language encoding of an input
again during each subsequent pass; however, generat- assembly-language program. Also, an implementation
ing machine code during multiple passes is probably of the algorithm in an existing assembler helped verify
less efficient than making the extra pass. that the algorithm does a good job of optimization in
During each pass, the algorithm keeps track of practice.
GROWTH, the number of bytes by which the machine- The new optimization algorithm is significant to
language encoding of the program has grown during assembly-language programmers because it eliminates
the pass. The algorithm uses GROWTH to anticipate the drudgery of manually optimizing time-critical
growth in the values of labels that are forward ref- assembly-language programs.
erences. Anticipating growth improves optimization. The new algorithm is significant to compiler writers
Consider, for example, a branch instruction that con- because it gives code generators the freedom to gen-
tains a forward reference. Since many instructions erate any assembly-language programs without having
may grow from a short form to a long form from one to worry about the restrictions that other optimization
pass to the next, the value of the label during one pass algorithms place on assembly-language programs.
can be much greater than the value during the previ-
ous pass; if the algorithm does not anticipate growth References
in the value of the label, the label value could appear
to require a large negative offset, even though the true [1] G. Frieder and H. J. Saal, “A Process for the De-
label value might require a small positive offset. termination of Addresses in Variable Length Ad-
dressing,” Comm. ACM , 19:335-338, 1976.
We cannot describe the entire algorithm in detail
here. Sterbenz [3] explains many details that we have [2] D. L. Richards, “How to Keep the Addresses
omitted, including extensions that are necessary for Short,” Comm. ACM , 14:346-349, 1971.
relocatable assembly. He also provides a proof of ter-
mination and a proof that the algorithm always gener- [3] Rudolph J. Sterbenz, “Optimizing Assembler,”
ates a correct encoding. Additionally, he analyzes the Thesis, Arizona State University, Tempe, Arizona,
algorithm’s performance and proves it to be O(n2 ) in 1992.
time and O(n) in space. [4] T. G. Szymanski, “Assembling Code for Ma-
In actual practice, the algorithm optimizes real pro- chines with Span-Dependent Instructions,” Comm.
grams with only five passes in typical cases and with ACM , 21:300-308, 1978.

You might also like