You are on page 1of 12

The Python Compiler for CMU Common Lisp

Robert A. MacLachlan
Carnegie Mellon University

Abstract An unusual amount of e ort was devoted to user-


interface aspects of compilation, even in comparison
The Python compiler for CMU Common Lisp has to commercial products. One of the theses of this
been under development for over ve years, and now paper is that many of Lisp's problems are best seen
forms the core of a production quality public domain as user-interface problems.
Lisp implementation. Python synthesizes the good In a large language like Common Lisp, ecient
ideas from Lisp compilers and source transforma- compiler algorithms and clever implementation can
tion systems with mainstream optimization and re- still be defeated in the details. In 1984, Brooks and
targetability techniques. Novel features include strict Gabriel[5] wrote of Common Lisp that:
type checking and source-level debugging of compiled
code. Unusual attention has been paid to the com- Too many costs of the language were dis-
piler's user interface. missed with the admonition that \any good
compiler" can take care of them. No one
has yet written nor is likely without tremen-
1 Design Overview dous e ort a compiler that does a fraction
of the tricks expected of it. At present, the
CMU Common Lisp is a public domain implementa- \hope" for all Common Lisp implementors
tion of Common Lisp. The intended application of is that an outstandingly portable Common
this implementation is as a high-end Lisp develop- Lisp compiler will be written and widely
ment environment. Rather than maximum peak per- used.
formance or low memory use, the main design goals
are the ease of attaining reasonable performance, of Python has a lot of operation-speci c knowledge:
debugging, and of developing programs at a high level there are over a thousand special-case rules for 640
of abstraction. di erent operations, 14 avors of function call and
CMU Common Lisp is portable, but does not sacri- three representations for multiple values. The + func-
ce eciency or functionality for ease of retargeting. tion has 6 transforms, 8 code generators and a type
It currently runs on three processors: MIPS R3000, inference method. This functionality is not without
SPARC, and IBM RT PC. Python has 50,000 lines of a cost; Python is bigger and slower than commercial
machine-independent code. Each back-end is 8,000 CL compilers.
lines of code (usually derived from an existing back-
end, not written from scratch.) Porting takes 2{4
wizard-months. 2 Type Checking
Python brings strong typing to Common Lisp. Type
declarations are optional in Common Lisp, but
Python's checking of any type assertions that do ap-
pear is both precise and eager:
Precise | All type assertions are tested using the
exact type asserted. All type assertions (both
implicit and declared) are treated equally.
Eager | Run-time type checks are done at the loca- In: defun type-error
tion of the rst type assertion, rather than being (addf x 3)
delayed until the value is actually used. {> +
==>
Explicit type declarations are neither trusted nor (the single- oat 3)
ignored; declarations simply provide additional asser- Warning: This is not a single- oat:
tions which the type checker veri es (with a run-time 3
test if necessary) and then exploits. This eager type
checking allows the compiler to use specialized vari-
able representations in safe code. In addition to detecting static type errors, ow
Because declared types are precisely checked, dec- analysis detects simple dynamic type errors like:
larations become more than an eciency hack. They
provide a mechanism for introducing moderately
complex consistency assertions into code. Type (defun foo (x arg)
declarations are more powerful than assert and (ecase x
check-type because they make it easier for the com- (:register-number
piler to propagate and verify assertions. (and (integerp arg) (>= x 0)))
...))
2.1 Run-Time Errors In: defun foo
Debuggers for other languages have caught up to (>= x 0)
Lisp, but Lisp programs often remain easier to de- {> if <
bug because run-time errors are signalled sooner af- ==>
ter the inconsistency arises and contain more useful x
information. Warning: This is not a (or oat rational):
Debugger capabilities are important, but the time :register-number
that the debugger is invoked is also very important.
There is only so much a debugger can do after a pro-
gram has already crashed and burned. Strict type Johnson[8] shows both the desirability and feasibil-
checking improves the quality and timeliness of run- ity of compile-time type checking as a way to reduce
time error messages. the brittleness (or increase the robustness) of Lisp
The eciency penalty of this run-time type check- programs developed using an exploratory methodol-
ing would be prohibitive if Python did not both min- ogy.
imize run-time checks through compile-time type in- Python does less interprocedural type inference, so
ference and apply optimizations to minimize the cost compile-time type checking serves mainly to reduce
of the residual type checks. the number of compile-debug cycles spent xing triv-
ial type errors. Not all (or even most) programs can
2.2 Compile-Time Type Errors be statically checked, so run-time errors are still pos-
sible. Some errors can be detected without testing,
In addition to proving that a type check is unneces- but testing is always necessary.
sary, type inference may also prove that the asserted Once the basic subtypep operation has been im-
and derived types are inconsistent. In the latter case, plemented, it can be used to detect any inconsisten-
the code fragment can never be executed without a cies during the process of type inference, minimizing
type error, so a compile-time warning is appropriate. the diculty of implementing compile-time type er-
Compiling this example: ror checking. The importance of static type checking
may be a subject of debate, but many consider it a
(defmacro addf (x n) requirement for developing reliable software. Perhaps
`(+ ,x (the single- oat ,n))) compile-time type errors can be justi ed on these un-
quanti able grounds alone.
(defun type-error (x)
(addf x 3)) 2.3 Style Implications
Gives this compiler warning: Precise, eager type checking lends itself to a new cod-
ing style, where:
 Declarations are speci ed to be as precise as
possible: (integer 3 7) instead of fixnum, which gives these eciency notes:
(or node null) instead of no declaration.
In: defun e -note
 Declarations are written during initial coding (+ i x y)
rather than being postponed until tuning, since ==>
declarations now enhance debuggability rather (+ (+ i x) y)
than hurting it. Note:
 Declarations are used with more con dence, Forced to do inline (signed-byte 32) arithmetic (cost 3).
since their correctness is tested during debug- Unable to do inline xnum arithmetic (cost 2) because:
ging. The rst argument is a (integer -1073741824
1073741822),
not a xnum.
3 Accountability to Users (aref s (+ i x y))
{> let*
A Lisp implementation hides a great deal of com- ==>
plexity from the programmer. This makes it eas- (kernel:data-vector-ref array kernel:index)
ier to write programs, but harder to attain good Note:
eciency.[7] Unlike in simpler languages such as C, Forced to do full call.
there is not a straightforward eciency model which Unable to do inline array access (cost 5) because:
programmers can use to anticipate program perfor- The rst argument is a base-string,
mance. not a simple-base-string.
In Common Lisp, the cost of generic arithmetic and
other conceptually simple operations varies widely de-
pending on both the actual operand types and on the
amount of compile-time type information. Any accu- 4 Source-Level Debugging
rate eciency model would be so complex as to place
unrealistic demands on programmers | the work- In addition to compiler messages, the other major
ings of the compiler are (rightly) totally mysterious user interface that Lisp implementations o er to the
to most programmers. programmer is the debugger. Historically, debugging
A solution to this problem is to make the compiler was done in interpreted code, and Lisp debuggers of-
accountable for its actions[15] by establishing a dialog fered excellent functionality when compared to other
with the programmer. This can be thought of as pro- programming environments. This was more due to
viding an automated eciency model. In a compiler, the use of a dynamic-binding interpreter than to any
accountability means that eciency notes: great complexity of the debuggers themselves.
There has been a trend toward compiling Lisp
 are needed whenever an inecient implementa- programs during development and debugging. The
tion is chosen for non-obvious reasons, rst-order reason for this is that in modern imple-
mentations, compiled code is so much faster than
 should convey the seriousness of the situation, the interpreter that interpreting complex programs
 must explain precisely what part of the program
has become impractical. Unfortunately, compiled de-
is being referred to, and bugging makes interpreter-based debugging tools like
*evalhook* steppers much less available. It is usu-
 must explain why the inecient choice was made ally not even possible to determine a more precise
(and how to enable ecient compilation.) error location than \somewhere in this function."
One solution is to provide source-level debugging of
3.1 Eciency Note Example compiled code. Although the idea has been around
for a while[18], source level debugging is not yet avail-
Consider this example of \not good enough" declara- able in the popular stock hardware implementations.
tions: To provide useful source-level debugging, CMU Com-
mon Lisp had to overcome these implementation dif-
(defun e -note (s i x y) culties:
(declare (string s) ( xnum i x y))  Excessive space and compile-time penalty for
(aref s (+ i x y))) dumping debug information.
 Excessive run-time performance penalty because #s(my-struct slot 2)
debuggability inhibits important optimizations #p"foo")
like register allocation and untagged representa-
tion. ; evaluation of local variables in expressions, and
6] (my-struct-slot (cadr structs))
 Poor correctness guarantees. Incorrect informa- 2
tion is worse than none, and even very rare in-
correctness destroys user con dence in the tool. ; display of the error location.
The Python symbolic debugger information is com- 6] source 1
pact, allows debugging of optimized code and pro- (oddp (#:***here*** (my-struct-slot struct)))
vides precise semantics. Source-based debugging is
supported by a bidirectional mapping between loca- ; and you can jump to editing the exact error location.
tions in source code and object code. Due to careful 6] edit
encoding, the space increase for source-level debug-
ging is less than 35%. Precise variable liveness infor-
mation is recorded, so potentially incorrect values are
not displayed, giving what Zellweger[22] calls truthful 4.2 The Debugger Toolkit
behavior.
The implementation details of stack parsing, debug
4.1 Debugger Example info and code representation are encapsulated by an
interface exported from the debug-internals pack-
This program demonstrates some simple source-level age. This makes it easy to layer multiple debug-
debugging features: ger user interfaces onto CMU CL. The interface is
somewhat similar to Zurawski[23]. Unlike SmallTalk,
Common Lisp does not de ne a standard representa-
(defstruct my-struct tion of activation records, etc., so the toolkit must be
(slot nil :type (or integer null))) de ned as a language extension.
(defun source-demo (structs)
(dolist (struct structs)
(when (oddp (my-struct-slot struct))
5 Ecient Abstraction
(return struct)))) Powerful high-level abstractions help make program
maintenance and incremental design/development
(source-demo easier[20, 21]; this is why Common Lisp has such a
(list (make-my-struct :slot 0) rich variety of abstraction tools:
(make-my-struct :slot 2)
(pathname "foo")))  Global and local functions, rst-class functional
values,
Resulting in a run-time error and debugger session:  Dynamic typing (ad-hoc polymorphism), generic
Type-error in source-demo: functions and inheritance,
#p"foo" is not of type my-struct  Macros, embedded languages, and other lan-
guage extensions.
Debug (type H for help)
Unfortunately, ineciency often forces program-
; The usual call frame printing, mers avoid appropriate abstraction. When the dif-
(source-demo (#s(my-struct slot 0) culty of ecient abstraction leads programmers to
#s(my-struct slot 2) adopt a low-level programming style, they have for-
#p"foo")) feited the advantages of using a higher-level language
like Lisp. In order to produce a high-quality Lisp
; But also local variable display by name, development environment, it is not sucient for the
6] l C-equivalent subset of Lisp to be as ecient as C |
struct = #p"foo" the higher-level programming tools characteristic of
structs = (#s(my-struct slot 0) Lisp must also be eciently implemented.
Python increases the eciency of abstraction in (declaim (inline nd-in))
several ways: (defun nd-in (next element list &key (key #'identity)
 Extensive partial evaluation at the Lisp seman-
(test #'eql test-p)
tics level, (test-not nil not-p))
(when (and test-p not-p)
 Ecient compilation of the full range of Common (error "Both :Test and :Test-Not supplied."))
Lisp features: local functions, closures, multiple (if not-p
values, arrays, numbers, structures, and (do ((current list (funcall next current)))
((null current) nil)
 Block compilation and inline expansion (syner- (unless (funcall test-not (funcall key current)
gistic with partial evaluation and untagged rep- element)
resentations.) (return current)))
(do ((current list (funcall next current)))
5.1 Partial Evaluation ((null current) nil)
(when (funcall test (funcall key current) element)
Partial evaluation is a generalization of constant (return current)))))
folding[9]. Instead of only replacing completely con-
stant expressions with the compile-time result of the When inline expanded and partially evaluated,
expression, partial evaluation transforms arbitrary this transformation results:
code using information on parts of the program that
are constant. Here are a few sample transformations: ( nd-in #'foo-next 3 foo-list
 Operation is constant: :key #'foo-o set :test #'<)
(funcall #'eql x y) , (eql x y) )
(do ((current foo-list (foo-next current)))
 Variables whose bindings are invariant: ((null current) nil)
(let ((x (+ a b))) x) (when (< (foo-o set current) 3)
) (return current)))
(+ a b)
 Unused expression deletion:
(+ (progn (* a b) c) d) 5.3 Meta-Programming and Partial
)
(+ c d) Evaluation
 Conditionals that are constant:
Meta-programming is writing programs by develop-
(if t x y) ) x ing domain-speci c extensions to the language. It is
(if (consp a-cons) x y) ) x an extremely powerful abstraction mechanism that
(if p (if p x y) z) ) (if p x z) Lisp makes comparatively easy to use. However,
macro writing is still fairly dicult to master, and
The canonicalization of control ow and environment ecient macro writing is tedious because the expan-
during ICR Conversion aids partial evaluation by sion must be hand-optimized.
making the value ow apparent. Partial evaluation makes meta-programming easier
Source transformation systems have demonstrated by making inline functions ecient enough so that
that partial evaluation can make abstraction ecient macros can be reserved for cases where functional ab-
by eliminating unnecessary generality[15, 4], but only straction is insucient. Such optimization also makes
a few non-experimental compilers (such as Orbit[9]) it simpler to write macros because unnecessary condi-
have made comprehensive use of this technique. Par- tionals and variable bindings can be introduced into
tial evaluation is especially e ective in combination the expansion without hurting eciency.
with inline expansion[19, 13].
5.4 Block Compilation
5.2 Partial Evaluation Example Block compilation assumes that global functions will
Consider this analog of find which searches in a not be rede ned, allowing the compiler to see across
linked list threaded by a speci ed successor function: function boundaries[18].
In CMU Common Lisp, full source-level debugging
is still supported in block-compiled functions. The
only e ect that block compilation has on debuggabil-  Eciency notes provide reliable feedback about
ity is that the entire block must be recompiled when numeric operations that were not compiled e-
any function in it is rede ned. Often block compila- ciently.
tion is rst used during tuning, so full-block recom-
pilation is rarely necessary during development. In addition to open-coding operations on single and
Block compilation reduces the basic call overhead double floats and fixnums, Python also implements
because the control transfer is a direct jump and ar- full-word integer arithmetic on signed and unsigned
gument syntax checking is safely eliminated. Python 32 bit operands. Word-integer arithmetic is used to
also does keyword and optional argument parsing at implement bignum operations, resulting in unusually
compile-time. good bignum performance.
Another advantage of block compilation is that lo-
cally optimized calling conventions can be used. In
particular, numeric arguments and return values are
7 Implementation
passed in non-pointer registers. Python's structure is broadly characterized by the
In addition to reductions in the call overhead itself, compilation phases and the data structures (or inter-
block compilation is important because it extends mediate representations) that they manipulate. Two
type inference and partial evaluation across function major intermediate representations are used:
boundaries. Block compilation allows some of the
bene ts of whole-program compilation[3] without ex-  The Implicit Continuation Representation (ICR)
cessive compile-time overhead or complete loss of in- represents the lisp-level semantics of the source
cremental rede nition. code during the initial phases. ICR is roughly
Python provides an extension which declares that equivalent to a subset of Common Lisp, but is
only certain functions in a block compilation are en- represented as a ow-graph rather than a syntax
try points that can be called from outside the com- tree. The main ICR data structures are nodes,
continuations and blocks. Partial evaluation
pilation unit. This is a syntactic convenience that and semantic analysis are done on this represen-
e ectively converts the non-entry defuns into an en- tation. Phases which only manipulate ICR com-
closing labels form. Whichever syntax is used, the prise the \front end".
entry point information allows greatly improved op-
timization because the compiler can statically locate  The Virtual Machine Representation (VMR)
all calls to block-local functions. represents the implementation of the source code
on a virtual machine. The virtual machine may
vary depending on the the target hardware, but
6 Numeric Eciency VMR is suciently stylized that most of the
phases which manipulate it are portable. All val-
Ecient compilation of Lisp numeric operations has ues are stored in Temporary Names (TNs). All
long been a concern[16, 6]. Even when type inference operations are Virtual OPerations (VOPs), and
is sucient to show that a value is of an appropriate their operands are either TNs or compile-time
numeric type, ecient implementation is still di- constants.
cult because the use of tagged pointers to represent
all objects is incompatible with ecient numeric rep- When compared to the intermediate representations
resentations. used by most Lisp compilers, ICR is lower level than
Python o ers a combination of several features an abstract syntax tree front-end representation, and
which make ecient numeric code much easier to at- VMR is higher level than a macro-assembler back-end
tain: representation.
 Untagged numbers can be allocated both in reg-
isters and on the stack. 8 Compilation Phases
Major phases are brie y described here. The phases
 Type inference and the wide palette of code gen- from \local call analysis" to \type check generation"
eration strategies reduce the number of declara- all interact; they are generally repeated until nothing
tions needed to get ecient code. new is discovered.
 Block compilation increases the range over which ICR conversion Convert the source into ICR, do-
ecient numeric representations can be used. ing macroexpansion and simple source-to-source
transformation. All names are resolved at this Control analysis Linearize the ow graph in a way
time, so later source transformations can't intro- that minimizes the number of branches. The
duce spurious name con icts. See section 9. block-level structure of the ow graph is frozen
at this point (see section 12.3.)
Local call analysis Find calls to local functions
and convert them to local calls to the correct en- Stack analysis Discard stack multiple values that
try point, doing keyword parsing, etc. Recognize are unused due to a local exit making the val-
once-called functions as lets. Create entry stubs ues receiver unreachable. The phase is only run
for functions that are subject to general function when TN allocation actually uses the stack val-
call. ues representation.
Find components Find the ow graph components VMR conversion Convert the ICR into VMR by
and compute a depth- rst ordering. Delete un- translating nodes into VOPs. Emit simple type
reachable code. Separate top-level code from checks. See section 12.
run-time code, and determine which components Copy propagation Use ow analysis to eliminate
are top-level components. unnecessary copying of TN values.
ICR optimize A grab-bag of all the non- ow ICR Representation selection Look at all references to
optimizations. Fold constant functions, prop- each TN to determine which representation has
agate types and eliminate code that computes the lowest cost. Emit appropriate VOPs for co-
unused values. Source transform calls to some ercions to and from that representation. See sec-
known global functions by replacing the function tion 13.
with a computed inline expansion. Merge blocks Register allocation Use ow analysis to nd the
and eliminate if-if constructs. Substitute let live set at each call site and to build the con ict
variables. See section 10. graph. Find a legal register allocation, attempt-
Constant propagation This phase uses global ow ing to minimize unnecessary moves. Registers
analysis to propagate information about lexical are saved across calls according to precise life-
variable values (and their types), eliminating un- time information, avoiding unnecessary memory
necessary type checks and tests. references. See section 14.
Type check generation Emit explicit ICR code Code generation Convert the VMR to machine in-
structions by calling the VOP code generators
for any necessary type checks that are too com- (see section 15.1.)
plex to be easily generated on the y by the back
end. Print compile-time type warnings. Instruction-level optimization
Some optimizations (such as instruction schedul-
Event driven operations Some ICR attributes are ing) must be done at the instruction level. This
incrementally recomputed, either eagerly on is not a general peephole optimizer.
modi cation of ICR, or lazily, when the relevant
information is needed. Assembly and dumping
Resolve branches and convert to an in-core func-
Environment analysis Determine which distinct tion or an object le with load-time xup infor-
environments need to be allocated, and what mation.
context needs to be closed over by each environ-
ment. We detect non-local exits and and mark
locations where dynamic state must be restored. 9 The Implicit Continuation
This is the last pure ICR pass. Representation (ICR)
TN allocation Iterate over all de ned functions, ICR is a ow graph representation that directly sup-
determining calling conventions and assigning ports ow analysis. The conversion from source into
TNs to local variables. Use type and policy in- ICR throws away large amounts of syntactic infor-
formation to determine which VMR translation mation, eliminating the need to represent most envi-
to use for known functions, and then create TNs ronment manipulation special forms. Control spe-
for expression evaluation temporaries. Print ef- cial forms such as block and go are directly rep-
ciency notes when a poor translation is chosen. resented by the ow graph, rather than appearing
as pseudo-expressions that obstruct the value ow. Some slots are shared between all node types (via
This elimination of syntactic and control informa- defstruct inheritance.) This information held in com-
tion removes the need for most \beta transformation" mon between all nodes often makes it possible to
optimizations[17]. avoid special-casing nodes on the basis of type. The
The set of special forms recognized by ICR conver- common information primarily concerns the order of
sion is exactly that speci ed in the Common Lisp evaluation and the destinations and properties of re-
standard; all macros are really implemented using sults. This control and value ow is indicated in the
macros. The full Common Lisp lambda is imple- node by references to continuations.
mented with a simple xed-arg lambda, greatly sim-
plifying later compiler phases. 9.3 Blocks
The block structure represents a basic block, in the
9.1 Continuations control ow sense. Control transfers other than sim-
A continuation represents a place in the code, or alter- ple sequencing are represented by information in the
natively, the destination of an expression result and block structure. The continuation for the last node in
a transfer of control[1]. This is the Implicit Continu- a block represents only the destination for the result.
ation Representation because the environment is not
directly represented (it is later recovered using ow 9.4 Source Tracking
analysis.) To make ow analysis more tractable, con- Source location information is recorded in each node.
tinuations are not rst-class | they are an internal In addition to being required for source-level debug-
compiler data structure whose representation is quite ging, source information is also needed for compiler
di erent from the function representation. error messages, since it is very dicult to reconstruct
anything resembling the original source from ICR.
9.2 Node Types
In ICR, computation is represented by these
types:
node 10 Properties of ICR
ICR is a ow-graph representation of de-sugared Lisp.
if Represents all conditionals. It combines most of the desirable properties of CPS
set Represents a setq. with direct support for data ow analysis.
The ICR transformations are e ectively source-to-
ref Represents a constant or variable reference. source, but because ICR directly re ects the control
semantics of Lisp (such as order of evaluation), it has
combination Represents a normal function call. di erent properties than an abstract syntax tree:
mv-combination  Syntactically di erent expressions with the same
Represents a multiple-value-call. This is control ow are canonicalized:
used to implement all multiple value receiving
forms except for multiple-value-prog1, which (tagbody (go 1)
is implicit. 2y
(go 3)
bind This represents the allocation and initialization 1x
of the variables in a lambda. (go 2)
3 z)
return This collects the return value from a lambda ,
and represents the control transfer on return. (progn x y z nil)
entry Marks the start of a dynamic extent that can  Use/de nition information is available, making
have non-local exits to it. Dynamic state can be substitutions easy.
saved at this point for restoration on re-entry.
 All lexical and syntactic information is frozen
exit Marks a potentially non-local exit. This node is in during ICR conversion (alphatization[17]), so
interposed between the non-local uses of a con- transformations can be made with no concern for
tinuation and the value receiver so that code to variable name con icts, scopes of declarations,
do a non-local exit can be inserted if necessary. etc.
 During ICR conversion this syntactic scoping in- inference, since inferred types can't be inverted into
formation is discarded and the environment is intelligible type speci ers for reporting back to the
made implicit, causing further canonicalization: user.
One important technique is to separately rep-
(let ((fun (let ((val init)) resent the results of forward and backward type
#'(lambda () ...val...)))) inference[4, 8]. This separates provable type infer-
... (funcall fun) ... ences from assertions, allowing type checks to be
, emitted only when needed.
(let ((val init)) Dynamic type inference is done using data ow
(let ((fun #'(lambda () ...val...))) analysis to solve a variant of the constant propaga-
... (funcall fun) ... tion problem. Wegman[19] o ers an example of how
 All the expressions that compute a value (uses of constant propagation can be used to do Lisp type in-
a continuation) are directly connected to the re- ference.
ceiver of that value, so syntactic close delimiters
don't hide optimization opportunities. As far as
optimization of + is concerned, these forms are 12 The Virtual Machine Rep-
identical:
resentation (VMR)
(+ 3 (progn (gork) 42))
(+ 3 42) VMR preserves the block structure of ICR, but anno-
(+ 3 (unwind-protect 42 (gork))) tates the nodes with a target dependent Virtual Ma-
chine (VM) Representation. VMR is a generalized
As the last example shows, this holds true even tuple representation derived from PQCC[11]. There
when the value position is not tail-recursive. are only two major objects in VMR: VOPs (Virtual
OPerations) and TNs (Temporary Names). All in-
structions are generated by VOPs and all run-time
11 Type Inference values are stored in TNs.
Type inference is important for eciently compil-
ing generic arithmetic and other operations whose
de nition is based on ad-hoc polymorphism. These 12.1 Values and Types
operations are generally only open-coded when the A TN holds a single value. When the number of val-
operand types are known at compile-time; insucient ues is known, multiple TNs are used to hold multi-
type information can cause programs to run thirty ple values, otherwise the values are forced onto the
times slower. Almost all prior work on Lisp type stack. TNs are used to represent user variables as
inference[4, 14, 9, 17] has concentrated on reducing well as expression evaluation temporaries (and other
the number of type declarations required to attain implicit values.)
acceptable performance (especially for generic arith-
metic operations.) A primitive type is a type meaningful at the im-
Type inference is also important because it allows plementation level; it is an abstraction of the pre-
compile-time type error messages and reduces the cise Common Lisp type. Examples are fixnum,
cost of run-time type checking. base-char and single-float. During VMR conver-
sion the primitive type of a value is used to determine
both where where the value can be stored and which
11.1 Type Inference Techniques type-speci c implementations of an operation can be
In order to support type inferences involving complex applied to the value. A primitive type may have mul-
types, type speci ers are parsed into an internal rep- tiple possible representations (see Representation
resentation. This representation supports subtypep, Selection).
type simpli cation, and the union, intersection and VMR conversion creates as many TNs as necessary,
di erence of arbitrary numeric subranges, member and annotating them only with their primitive type. This
or types. function types are used to represent func- keeps VMR conversion from needing any information
tion argument signatures. Baker[2] describes an ef- about the number or kind of registers, etc. It is the
cient and relatively simple decision procedure for responsibility of Register Allocation to eciently
subtypep, but it doesn't support accountable type map the allocated TNs onto nite hardware resources.
12.2 Virtual OPerations This phase is in e ect a pre-pass to register allo-
cation. The main reason for its separate existence
The PQCC VM technology di ers from a conven- is that representation conversions may be fairly com-
tional virtual machine in that it is not xed. The set plex (e.g. themselves requiring register allocation),
of VOPs varies depending on the target hardware. and thus must be discovered before register alloca-
The default calling mechanisms and a few prim- tion.
itives are implemented using standard VOPs that
must implemented by each VM. In PQCC, it was a
rule of thumb that a VOP should translate into about
one instruction. VMR uses a number of VOPs that 14 Register allocation
are much more complex (e.g. function call) in order
to hide implementation details from VMR conversion. Consists of two passes, one which does an initial reg-
Calls to primitive functions such as + and car are ister allocation, a post-pass that inserts spilling code
translated to VOP equivalents using declarative in- to satisfy violated register allocation constraints.
formation in the particular VM de nition; VMR con- The interface to the register allocator is derived
version makes no assumptions about which opera- from the extremely general storage description model
tions are primitive or what operand types are worth developed for the PQCC project[12]. The two major
special-casing. concepts are:
storage base (SB): A storage resource such as a
12.3 Control ow register le or stack frame. It is composed of
In ICR, control transfers are implicit in the structure same-size units which are jointly allocated to cre-
of the ow graph. Ultimately, a linear instruction se- ate larger values. Storage bases may be xed-size
quence must be generated. The Control Analysis (for registers) or variable in size (for memory.)
phase decides what order to emit blocks in. In e ect,
this amounts to deciding which arm of each condi- storage class (SC): A subset of the locations in an
tional is favored by allowing to drop through; the underlying SB, with an associated element size.
right decision gives optimizations such as loop rota- Examples are descriptor-reg (based on the
tion. A poor ordering would result in unnecessary register SB), and double-float-stack (based
unconditional branches. on the number-stack SB, element size 2, o sets
In VMR all control transfers are explicit. Con- dual-word aligned.)
ditionals are represented using conditional branch
VOPs that take single target label and a not-p ag In addition to allowing accurate description of any
indicating whether the sense of the test is negated. conceivable set of hardware storage resources, the
An unconditional branch VOP is emitted afterward SC/SB model is synergistic with representation se-
if the other path isn't a drop-through. lection. Representations are SCs. The set of legal
representations for a primitive type is just a list of
SCs. This is above and beyond the advantages noted
13 Representation selection in [6] of having a general-purpose register allocator as
in the TNBind/Pack model[10].
Some types of object (such as single-float) have
multiple possible representations[6]. Multiple repre-
sentations are useful mainly when there is a particu-
larly ecient untagged representation. In this case,
15 Virtual Machine De nition
the compiler must choose between the normal tagged The VM description for an architecture has three ma-
pointer representation and an alternate untagged rep- jor parts:
resentation. Representation selection has two sub-
phases:  The SB and SC de nition for the storage re-
 TN representations are selected by examining all
sources, with their associated primitive types.
the TN references and choosing the representa-
tion with the lowest cost.  The instruction set de nition used by the assem-
bler, assembly optimizer and disassembler.
 Representation conversion code is inserted where
the representation of a value is changed.  The de nitions of the VOPs.
15.1 VOP de nition example when the programmer can assume a high level of op-
VOPs are de ned using the define-vop macro. Since timizations.
classes of operations (memory references, ALU op- The main cost of this additional functionality is
erations, etc.) often have very similar implementa- in compile time: compilations using Python may be
tions, define-vop provides two mechanisms for shar- as much as 3x longer than they are for the quick-
ing the implementation of similar VOPs: inheritance est commercial Common Lisp compilers. However,
and VOP variants. It is common to de ne an \ab- on modern workstations with at least 24 megabytes
stract VOP" which is then inherited by multiple real of memory, incremental recompilation times are still
VOPs. In this example, single-float-op is an ab- quite acceptable. A substantial part of the slowdown
stract VOP which is used to de ne all dyadic single- results from poor memory system performance caused
oat operations on the MIPS R3000 (only the + def- by Python's large code size (3.4 megabytes) and large
inition is shown.) data structures (0.3{3 megabytes.)

(de ne-vop (single- oat-op) 17 Summary


(:args (x :scs (single-reg))  Strict type checking and source-level debugging
(y :scs (single-reg))) make development easier.
(:results (r :scs (single-reg)))
(:variant-vars operation)  It is both possible and useful to give compile-
(:policy :fast-safe) time warnings about type errors in Common Lisp
(:note "inline oat arithmetic") programs.
(:vop-var vop)  Optimization can reduce the overhead of type
(:save-p :compute-only) checking on conventional hardware.
(:generator 2
(note-this-location vop :internal-error)  The diculty of producing ecient numeric and
(inst oat-op operation :single r x y))) array code in Common Lisp is largely due to the
compiler's opacity to the user. The lack of a
(de ne-vop (+/single- oat single- oat-op) simple eciency model can be compensated for
(:variant '+)) by making the compiler accountable to the user
for its optimization decisions.
 Partial evaluation is a practical programming
16 Evaluation tool when combined with inline expansion and
block compilation. Lisp-level optimization is im-
For typical benchmarks, CMU CL run-times are mod- portant because it allows programs to be written
estly better than commercial implementations. When at a higher level without sacri cing eciency.
average speeds of the non-I/O, non-system Gabriel  It is possible to get most of the advantages of
benchmarks were compared to a commercial imple- continuation passing style without the disadvan-
mentation, CMU unsafe code was 1.2 times faster, tages.
and safe code was 1.8 times faster. Applications that
make heavy use of Python's unique features (such as  The clear ICR/VMR division helps portability.
partial evaluation and numeric representation) can be It is probably a mistake to convert a Lisp-based
5x or more faster than commercial implementations. representation directly to assembly language.
Generally, poorly tuned programs show greater  A very general model of storage allocation (such
performance improvements under CMU CL than as the one used in PQCC) is extremely helpful
highly tweaked benchmarks, since Python makes a when de ning untagged representations for Lisp
greater e ort to consistently implement all language values. This is above and beyond the utility of a
features eciently and to provide eciency diagnos- general register allocator noted in [6].
tics.
The non-eciency advantages of CMU CL are at
least as important, but harder to quantify. Commer-
cial implementations do weaker type checking than
18 Acknowledgments
CMU CL, and do not support source level debugging Scott Fahlman's in nite patience and William Lott's
of optimized code. It is also easier to write programs machine-level expertise made this project possible.
This research was sponsored by The Defense Ad- [11] Leverett, B. W., Cattel, R. G. G., Hobbs,
vanced Research Projects Agency, Information Sci- S. O., Newcomer, J. M., Reiner, A. H.,
ence and Technology Oce, under the title \Research Schatz, B. R., and Wulf, W. A. An
on Parallel Computing", ARPA Order No. 7330, is- overview of the production quality compiler-
sued by DARPA/CMO under Contract MDA972-90- compiler project. Computer 13, 8 (Aug. 1980),
C-0035. The views and conclusions contained in this 38{49.
document are those of the authors and should not be
interpreted as representing the ocial policies, either [12] Reiner, A. Cost minimization in register
expressed or implied, of the U.S. Government. assignment for retargetable compilers. Tech.
Rep. CMU-CS-84-137, Carnegie Mellon Univer-
sity School of Computer Science, June 1984.
References [13] Scheifler, R. W. An analysis of inline substi-
tution for a structured programming language.
[1] Appel, A. W., and Jim, T. Continuation- Communications of the ACM 20, 9 (Sept. 1977),
passing, closure-passing style. In ACM Con- 647{654.
ference on the Principles of Programming Lan-
guages (1989), ACM, pp. 293{302. [14] Shivers, O. Control-Flow Analysis in Higher-
Order Languages. PhD thesis, Carnegie Mel-
[2] Baker, H. G. A decision procedure for common lon University School of Computer Science, May
lisp's subtypep predicate. to appear in Lisp and 1991.
Symbolic Computation, Sept. 1990.
[15] Steele, B. S. K. An accountable source-to-
[3] Baker, H. G. The nimble project | an applica- source transformation system. Tech. Rep. AI-
tions compiler for real-time common lisp. In Pro- TR-636, MIT AI Lab, June 1981.
ceedings of the InfoJapan '90 Conference (Oct.
1990), Information Processing Society of Japan. [16] Steele Jr., G. L. Fast arithmetic in maclisp.
In Proceedings of the 1977 MACSYMA User's
[4] Beer, R. The compile-time type inference and Conference (1977), NASA.
type checking of common lisp programs: a tech- [17] Steele Jr., G. L. Rabbit: A compiler for
nical summary. Tech. Rep. TR-88-116, Case scheme. Tech. Rep. AI-TR-474, MIT AI Lab,
Western Reserve University, 1988. May 1978.
[5] Brooks, R. A., and Gabriel, R. P. A cri- [18] Teitelman, W., et al. Interlisp Reference
tique of common lisp. In ACM Conference on Manual. Xerox Palo Alto Research Center, 1978.
Lisp and Functional Programming (1984), pp. 1{
8. [19] Wegman, M. N., and Zadeck, K. Constant
propagation with conditional branches. Tech.
[6] Brooks, R. A., Gabriel, R. P., and Steele Rep. CS-89-36, Brown University, Department
Jr., G. L. An optimizing compiler for lexically of Computer Science, 1989.
scoped lisp. In ACM Symposium on Compiler [20] Winograd, T. Breaking the complexity bar-
Construction (1982), ACM, pp. 261{275. rier again. In ACM SIGPLAN{SIGIR Interface
[7] Gabriel, R. P. Lisp: Good news, bad news, Meeting (1973), ACM, pp. 13{22.
how to win big. to appear, 1991. [21] Winograd, T. Beyond programming lan-
[8] Johnson, P. M. Type Flow Analysis for Ex- guages. Communications of the ACM 22, 7 (Aug.
ploratory Software Development. PhD thesis, 1979), 391{401.
University of Massachusetts, Sept. 1990. [22] Zellweger, P. T. Interactive source-level de-
bugging of optimized programs. Tech. Rep. CSL-
[9] Kranz, D., Kelsey, R., Rees, J., et al. Or- 84-5, Xerox Palo Alto Research Center, May
bit: An optimizing compiler for scheme. In ACM 1984.
Symposium on Compiler Construction (1986),
ACM, pp. 219{233. [23] Zurawski, L. W., and Johnson, R. E. De-
bugging optimized code with expected behavior.
[10] Leverett, B. W. Register Allocation in Opti- to appear, Apr. 1991.
mizing Compilers. UMI Research Press, 1983.

You might also like