A Brief History of Just-In-Time

John Ay o k
Department of Computer S ien e
University of Calgary

Software systems have been using \just-in-time" ompilation (JIT) te hniques sin e the 1960s.
Broadly, JIT ompilation in ludes any translation performed dynami ally, after a program has
started exe ution. We examine the motivation behind JIT ompilation, onstraints imposed on
JIT ompilation systems, and present a lassi ation s heme for su h systems. This lassi ation
emerges as we survey forty years of JIT work, from 1960{2000.
Categories and Subje t Des riptors: D.3.4 [Programming Languages℄: Pro essors; K.2 [Computing Milieux℄: History of Computing|Software
General Terms: Languages, Performan e
Additional Key Words and Phrases: Just-in-time ompilation, dynami ompilation


Those who annot remember the past are ondemned to repeat it.
George Santayana, 1863{1952 [Bartlett 1992℄
This oft-quoted line is all too appli able in omputer s ien e. Ideas are generated,
explored, set aside | only to be reinvented years later. Su h is the ase with what
is now alled \just-in-time" or dynami ompilation, whi h refers to translation
that o urs after a program begins exe ution.
Stri tly speaking, JIT ompilation systems (\JIT systems" for short) are ompletely unne essary. They are only a means to improve the time and spa e eÆ ien y
of programs. After all, the entral problem JIT systems address is a solved one:
translating programming languages into a form that is exe utable on a target platform.
What is translated? The s ope and nature of programming languages that require
translation into exe utable form overs a wide spe trum. Traditional programming
languages like Ada, C, and Java are in luded, as well as little languages [Bentley
1988℄ su h as regular expressions.
Traditionally, there are two approa hes to translation: ompilation and interpretation. Compilation translates one language into another | C to assembly
language, for example | with the impli ation that the translated form will be
This work was supported in part by a grant from the National S ien e and Engineering Resear h
Coun il of Canada.
Name: John Ay o k
AÆliation: Department of Computer S ien e, University of Calgary
Address: 2500 University Drive N.W., Calgary, Alberta, Canada T2N 1N4
Address: EMAIL: ay o k ps .u algary. a


J. Ay o k

more amenable to later exe ution, possibly after further ompilation stages. Interpretation eliminates these intermediate steps, performing the same analyses as
ompilation, but performing exe ution immediately.
JIT ompilation is used to gain the bene ts of both (stati ) ompilation and
interpretation. These bene ts will be brought out in later se tions, so we only
summarize them here:
|Compiled programs run faster, espe ially if they are ompiled into a form whi h
is dire tly exe utable on the underlying hardware. Stati ompilation an also
devote an arbitrary amount of time to program analysis and optimization. This
brings us to the primary onstraint on JIT systems: speed. A JIT system must
not ause untoward pauses in normal program exe ution as a result of its operation.
|Interpreted programs are typi ally smaller, if only be ause the representation
hosen is at a higher level than ma hine ode, and an arry mu h more semanti
information impli itly.
|Interpreted programs tend to be more portable. Assuming a ma hine-independent
representation, su h as high-level sour e ode or virtual ma hine ode, only the
interpreter need be supplied to run the program on a di erent ma hine. (Of
ourse, the program still may be doing nonportable operations, but that's a different matter.)
|Interpreters have a ess to run-time information, su h as input parameters, that
may be unde idable using stati analysis.
To narrow our fo us somewhat, we only examine software-based JIT systems
that have a nontrivial translation aspe t. Keppel, Eggers, and Henry eloquently
build an argument for the more general ase of run-time ode generation, where
this latter restri tion is removed [Keppel et al. 1991℄.
Note that we use the term \exe ution" in a broad sense | we all a program
representation exe utable if it an be exe uted by the JIT system in any manner,
either dire tly as in ma hine ode, or indire tly using an interpreter.

Work on JIT ompilation te hniques often fo uses around implementation of a
parti ular programming language. We have followed this same division in this
se tion, ordering from earliest to latest where possible.
2.1 Genesis

Self-modifying ode has existed sin e the earliest days of omputing, but we ex lude
that from onsideration be ause there is typi ally no ompilation or translation
aspe t involved.
Instead, we suspe t that the earliest published work on JIT ompilation is M Carthy's 1960 LISP paper. He mentions ompilation of fun tions into ma hine
language, a pro ess fast enough that the ompiler's output needn't be saved [M Carthy 1960℄. This an be seen as an inevitable result of having programs and data
share the same notation [M Carthy 1981℄.
Another early published referen e to JIT ompilation dates ba k to 1966. The
University of Mi higan Exe utive System for the IBM 7090 expli itly notes that the

A Brief History of Just-In-Time 


assembler [University of Mi higan 1966b, page 1℄ and loader [University of Mi higan
1966a, page 6℄ an be used to translate and load during exe ution. (The manual's
prefa e says that most se tions were written before August 1965, so this likely dates
ba k further.)
Thompson's 1968 CACM paper is frequently ited as \early work" in modern
publi ations. He ompiles regular expressions into IBM 7094 ode in an ad ho
fashion, ode whi h is then exe uted to perform mat hing [Thompson 1968℄.

2.2 LC

The Language for Conversational Computing, or LC2 , was a language designed for
intera tive programming [Mit hell et al. 1968℄. Although used brie y at CarnegieMellon University for tea hing, LC2 was primarily an experimental language [Mit hell
2000℄. It might otherwise be onsigned to the dustbin of history, if not for the te hniques used by Mit hell in its implementation [Mit hell 1970℄, te hniques that later
in uen ed JIT systems for Smalltalk and Self.
Mit hell observed that ompiled ode an be derived from an interpreter at runtime, simply by storing the a tions performed during interpretation. This only
works for ode that has been exe uted, however | he gives the example of an
if-then-else statement, where only the else-part is exe uted. To handle su h ases,
ode was generated for the unexe uted part whi h re-invoked the interpreter should
it ever be exe uted (the then-part, in the example above).
2.3 APL

The seminal work on eÆ ient APL implementation is Abrams' dissertation [Abrams
1970℄. Abrams on o ted two key APL optimization strategies, whi h he des ribed
using the onnotative terms \drag-along" and \beating." Drag-along defers expression evaluation as long as possible, gathering ontext information in the hopes
that a more eÆ ient evaluation method might be ome apparent; this might now be
alled lazy evaluation. Beating is the transformation of ode to redu e the amount
of data manipulation involved during expression evaluation.
Drag-along and beating relate to JIT ompilation be ause APL is a very dynami
language; types and attributes of data obje ts are not, in general, known until runtime. To fully realize these optimizations' potential, their appli ation must be
delayed until run-time information is available.
Abrams' \APL Ma hine" employed two separate JIT ompilers. The rst translated APL programs into post x ode for a D-ma hine1 , whi h maintained a bu er
of deferred instru tions. The D-ma hine a ted as an `algebrai ally simplifying ompiler' [Abrams 1970, page 84℄ whi h would perform drag-along and beating at runtime, invoking an E-ma hine to exe ute the bu ered instru tions when ne essary.
Abrams' work was dire ted towards an ar hite ture for eÆ ient support of APL,
hardware support for high-level languages being a popular pursuit of the time.
Abrams never built the ma hine, however; an implementation was attempted a few
years later [S hroeder and Vaughn 1973℄.2 The te hniques were later expanded upon
by others [Miller 1977℄, although the basi JIT nature never hanged, and were used
1 Presumably \D" stood for \Deferral" or \Drag-Along."
2 In the end, Litton never built the ma hine [Mauriello 2000℄.


J. Ay o k


sour e ode

interpreted virtual
ma hine ode


native ode


Fig. 1. The time-spa e tradeo .

for the software implementation of Hewlett-Pa kard's APLn3000 [Johnston 1977;
van Dyke 1977℄.
2.4 Mixed Code, Throw-Away Code, and BASIC

The tradeo between exe ution time and spa e often underlies the argument for
JIT ompilation. This tradeo is summarized in Figure 1. The other onsideration
is that most programs spend the majority of time exe uting a minority of ode,
based on data from empiri al studies [Knuth 1971℄. Two ways to re on ile these
observations have appeared: mixed ode and throw-away ompiling.
\Mixed ode" refers to the implementation of a program as a mixture of native ode and interpreted ode, proposed independently by [Dakin and Poole 1973℄
and [Dawson 1973℄. The frequently-exe uted parts of the program would be in native ode, the infrequently-exe uted parts interpreted, hopefully yielding a smaller
memory footprint with little or no impa t on speed. A ne-grained mixture is
implied: implementing the program with interpreted ode and the libraries with
native ode would not onstitute mixed ode.
A further twist to the mixed ode approa h involved ustomizing the interpreter [Pittman 1987℄. Instead of mixing native ode into the program, the native
ode manifests itself as spe ial virtual ma hine instru tions; the program is then
ompiled entirely into virtual ma hine ode.
The basi idea of mixed ode, swit hing between di erent types of exe utable
ode, is still appli able to JIT systems, although few resear hers at the time advo ated generating the ma hine ode at run-time. Keeping both a ompiler and
an interpreter in memory at run-time may have been onsidered too ostly on the
ma hines of the day, negating any program size tradeo .
The ase against mixed ode omes from software engineering [Brown 1976℄.
Even assuming that the majority of ode will be shared between the interpreter
and ompiler, there are still two disparate pie es of ode (the interpreter proper
and the ompiler's ode generator) whi h must be maintained and exhibit identi al
(Proponents of partial evaluation, or program spe ialization, will note that this
is a spe ious argument in some sense, be ause a ompiler an be thought of as a
spe ialized interpreter [Jones et al. 1993℄. However, the use of partial evaluation
te hniques is not urrently widespread.)
This brings us to the se ond manner of re on iliation: throw-away ompiling [Brown
1976℄. This was presented purely as a spa e optimization: instead of stati ompilation, parts of a program ould be ompiled dynami ally on an as-needed basis.
Upon exhausting memory, some or all of the ompiled ode ould be thrown away;
the ode would be regenerated later if ne essary.
BASIC was the testbed for throw-away ompilation. Brown essentially hara terized the te hnique as a good way to address the time-spa e tradeo [Brown 1976℄;

A Brief History of Just-In-Time 


Hammond was somewhat more adamant, laiming throw-away ompilation to be
superior ex ept when memory is tight [Hammond 1977℄.
A good dis ussion of mixed ode and throw-away ompiling may be found in [Brown

Some of the rst work on JIT systems where programs automati ally optimize
their \hot spots" at run-time is due to Hansen [Hansen 1974℄.3 He addressed three
important questions:
(1) What ode should be optimized? Hansen hose a simple, low- ost frequen y
model, maintaining a frequen y-of-exe ution ounter for ea h blo k of ode (we
use the generi term \blo k" to des ribe a unit of ode; the exa t nature of a
blo k is immaterial for our purposes).
(2) When should the ode be optimized? The frequen y ounters served a se ond
r^ole: rossing a threshold value made the asso iated blo k of ode a andidate
for the next \level" of optimization, as des ribed below. \Supervisor" ode was
invoked between blo ks, whi h would assess the ounters, perform optimization
if ne essary, and transfer ontrol to the next blo k of ode. The latter operation
ould be a dire t all, or interpreter invo ation | mixed ode was supported
by Hansen's design.
(3) How should the ode be optimized? A set of onventional ma hine-independent
and ma hine-dependent optimizations were hosen and ordered, so a blo k
might rst be optimized by onstant folding, by ommon subexpression elimination the se ond time optimization o urs, by ode motion the third time,
and so on. Hansen observes that this s heme limits the amount of time taken
at any given optimization point (espe ially important if the frequen y model
proves to be in orre t), as well as allowing optimizations to be in rementally
added to the ompiler.
Programs using the resulting Adaptive FORTRAN system reportedly were not
always faster than their stati ally ompiled-and-optimized ounterparts, but performed better overall.
Returning again to mixed ode, Ng and Cantoni implemented a variant of FORTRAN using this te hnique | in a sense [Ng and Cantoni 1976℄. Their system
ould ompile fun tions at run-time into \pseudo-instru tions," probably a tokenized form of the sour e ode rather than a lower-level virtual ma hine ode.
The pseudo-instru tions would then be interpreted. They laimed that run-time
ompilation was useful for some appli ations and avoided a slow ompile-link pro ess. They did not produ e mixed ode at run-time; their use of the term referred
to the ability to have stati ally- ompiled FORTRAN programs all their pseudoinstru tion interpreter automati ally when needed via linker tri kery.
3 [Dawson 1973℄ mentions a 1967 report by Barbieri and Morrissey where a program begins exe ution in interpreted form, and frequently-exe uted parts ` an be onverted to ma hine ode.'
However, it is not lear if the onversion to ma hine ode o urred at run-time. Unfortunately,
we have not been able to obtain the ited work as of this writing.


J. Ay o k

2.6 Smalltalk

Smalltalk sour e ode is ompiled into virtual ma hine ode when new methods are
added to a lass [Goldberg and Robson 1985℄. The performan e of nave Smalltalk
implementations left something to be desired, however.
Rather than atta k the performan e problem with hardware, Deuts h and S hi man made key optimizations in software. The observation behind this was that
they ould pi k the most eÆ ient representation for information, so long as onversion between representations happened automati ally and transparently to the
user [Deuts h and S hi man 1984℄.
JIT onversion of virtual ma hine ode to native ode was one of the optimization
te hniques they used, a pro ess they likened to ma ro-expansion. Pro edures were
ompiled to native ode lazily, when exe ution entered the pro edure; the native
ode was a hed for later use. Their system was linked to memory management in
that native ode would never be paged out, just thrown away and regenerated later
if ne essary.
In turn, Deuts h and S hi man redit the dynami translation idea to Rau [Rau
1978℄. Rau was on erned with \universal host ma hines" whi h would exe ute a
variety of high-level languages well ( ompared to, say, a spe ialized APL ma hine).
He proposed dynami translation to mi ro ode at the granularity of single virtual
ma hine instru tions. A hardware a he, the dynami translation bu er, would
store ompleted translations; a a he miss would signify a missing translation, and
fault to a dynami translation routine.
2.7 Self

The Self programming language [Ungar and Smith 1987; Smith and Ungar 1995℄,
in ontrast to many of the other languages mentioned in this se tion, is primarily a
resear h vehi le. Self is in many ways in uen ed by Smalltalk, in that both are pure
obje t-oriented languages | everything is an obje t. But Self es hews lasses in
favor of prototypes, and otherwise attempts to unify a number of on epts. Every
a tion is dynami and hangeable, and even basi operations, like lo al variable
a ess, require invo ation of a method. To further ompli ate matters, Self is a
dynami ally-typed language, meaning that the types of identi ers are not known
until run-time.
Self's unusual design makes eÆ ient implementation diÆ ult. This resulted in the
development of the most aggressive, ambitious JIT ompilation and optimization
up to that time. The Self group noted three distin t generations of ompiler [Holzle
1994℄, an organization we follow below; in all ases, the ompiler was invoked
dynami ally upon a method's invo ation, as in Deuts h and S hi man's Smalltalk
2.7.1 First Generation. Almost all the optimization te hniques employed by Self
ompilers dealt with type information, and transforming a program in su h a way
that some ertainty ould be had about the type of identi ers. Only a few te hniques
had a dire t relationship with JIT ompilation, however.
Chief among these, in the rst generation Self ompiler, was ustomization [Chambers et al. 1989; Chambers and Ungar 1989; Chambers 1992℄. Instead of dynami ally ompiling a method into native ode that would work for any invo ation of

A Brief History of Just-In-Time 


the method, the ompiler produ ed a version of the method that was ustomized
to that parti ular ontext. Mu h more type information was available to the JIT
ompiler ompared to stati ompilation, and by exploiting this fa t the resulting ode was mu h more eÆ ient. While method alls from similar ontexts ould
share ustomized ode, \over ustomization" ould still onsume a lot of memory at
run-time; ways to ombat this problem were later studied [Die kmann and Holzle
2.7.2 Se ond Generation. The se ond generation Self ompiler extended one of
the program transformation te hniques used by its prede essor, and omputed mu h
better type information for loops [Chambers and Ungar 1990; Chambers 1992℄.
This Self ompiler's output was indeed faster than that of the rst generation,
but it ame at a pri e. The ompiler ran 15 to 35 times more slowly on ben hmarks [Chambers and Ungar 1990; Chambers and Ungar 1991℄, to the point where
many users refused to use the new ompiler! [Holzle 1994℄
Modi ations were made to the responsible algorithms to speed up ompilation [Chambers and Ungar 1991℄. One su h modi ation was alled \deferred ompilation of un ommon ases."4 The ompiler is informed that ertain events, su h
as arithmeti over ow, are unlikely to o ur. That being the ase, no ode is generated for these un ommon ases; a stub is left in the ode instead, whi h will invoke
the ompiler again if ne essary. The pra ti al result of this is that the ode for un ommon ases need not be analyzed upon initial ompilation, saving a substantial
amount of time.5
Ungar, Smith, Chambers, and Holzle [1992℄ ontains a good presentation of optimization te hniques used in Self and the resulting performan e in the rst and
se ond generation ompilers.
2.7.3 Third Generation. The third generation Self ompiler atta ked the issue of
slow ompilation at a mu h more fundamental level. The Self ompiler was part
of an intera tive, graphi al programming environment; exe uting the ompiler onthe- y resulted in a noti eable pause in exe ution. Holzle argued that measuring
pauses in exe ution for JIT ompilation by timing the amount of time the ompiler
took to run was de eptive, and not representative of the user's experien e [Holzle
1994; Holzle and Ungar 1994b℄. Two invo ations of the ompiler ould be separated
by a brief spurt of program exe ution, but would be per eived as one long pause by
the user. Holzle ompensated by onsidering temporally-related groups of pauses,
or `pause lusters,' rather than individual ompilation pauses.
As for the ompiler itself, ompilation time was redu ed | or at least spread out
| by using adaptive optimization, similar to Hansen's FORTRAN work. Initial
method ompilation was performed by a fast, non-optimizing ompiler; frequen yof-invo ation ounters were kept for ea h method to determine when re ompilation
should o ur [Holzle 1994; Holzle and Ungar 1994b; Holzle and Ungar 1994a℄. They
4 In Chambers' thesis, this is referred to as \lazy ompilation of un ommon bran hes," an idea he
attributes to a suggestion by John Maloney in 1989 [Chambers 1992, page 123℄. However, this is
the same te hnique used in [Mit hell 1970℄, albeit for di erent reasons.
5 This te hnique an be applied to dynami ompilation of ex eption handling ode [Lee et al.


J. Ay o k

make an interesting omment on this me hanism:
. . . in the ourse of our experiments we dis overed that the trigger me hanism (\when") is mu h less important for good re ompilation results
than the sele tion me hanism (\what"). [Holzle 1994, page 38℄6
This may ome from the slightly ounter-intuitive notion that the best andidate for re ompilation is not ne essarily the method whose ounter triggered the
re ompilation. Obje t-oriented programming style tends to en ourage short methods; a better hoi e may be to (re)optimize the method's aller and in orporate the
frequently-invoked method inline [Holzle and Ungar 1994b℄.
Adaptive optimization adds the ompli ation that a modi ed method may already be exe uting, and have information (su h as an a tivation re ord on the
sta k) that depends on the previous version of the modi ed method [Holzle 1994℄;
this must be taken into onsideration.7
The Self ompiler's JIT optimization was assisted by the introdu tion of \type
feedba k" [Holzle 1994; Holzle and Ungar 1994a℄. As a program exe uted, type
information was gathered by the run-time system, a straightforward pro ess. This
type information would then be available if and when re ompilation o urred, permitting more aggressive optimization. Information gleaned using type feedba k was
later shown to be omparable with, and perhaps omplementary to, information
from stati type inferen e [Agesen and Holzle 1995; Agesen 1996℄.
2.8 Slim Binaries and Oberon

One problem with software distribution and maintenan e is the heterogeneous omputing environment in whi h software runs: di erent omputer ar hite tures require
di erent binary exe utables. Even within a single line of ba kwards- ompatible pro essors, many variations in apability an exist; a program stati ally ompiled for
the least- ommon denominator of pro essor may not take full advantage of the
pro essor on whi h it eventually exe utes.
In his do toral work, Franz addressed these problems using \slim binaries" [Franz
1994; Franz and Kistler 1997℄. A slim binary ontains a high-level, ma hineindependent representation8 of a program module. When a module is loaded,
exe utable ode is generated for it on-the- y, whi h an presumably tailor itself
to the run-time environment. Franz, and later Kistler, laimed that generating
ode for an entire module at on e was often superior to the method-at-a-time strategy used by Smalltalk and Self, in terms of the resulting ode performan e [Franz
1994; Kistler 1999℄.
Fast ode generation was riti al to the slim binary approa h. Data stru tures
were deli ately arranged to fa ilitate this; generated ode that ould be reused was
noted and opied if needed later, rather than being regenerated [Franz 1994℄.
Franz implemented slim binaries for the Oberon system, whi h allows dynami
loading of modules [Wirth and Gutkne ht 1989℄. Loading and generating ode for
6 The

same omment, with slightly di erent wording, also appears in [Holzle and Ungar 1994a,
page 328℄.
7 Hansen's work ould ignore this possibility; the FORTRAN of the time did not allow re ursion,
and so a tivation re ords and a sta k were unne essary [Sebesta 1999℄.
8 An abstra t syntax tree, to be pre ise.

A Brief History of Just-In-Time 


a slim binary was not faster than loading a traditional binary [Franz 1994; Franz
and Kistler 1997℄, but Franz argued that this would eventually be the ase as the
speed dis repan y between pro essors and I/O devi es in reased [Franz 1994℄.
Using slim binaries as a starting point, Kistler's work investigated \ ontinuous"
run-time optimization, where parts of an exe uting program an be optimized ad
in nitum. He ontrasts this to the adaptive optimization used in Self, where optimization of methods would eventually ease [Kistler 1999℄.
Of ourse, re-optimization is only useful if a new, better, solution an be obtained;
this implies that ontinuous optimization is best suited to optimizations whose
input varies over time with the program's exe ution.9 A ordingly, Kistler looked
at a he optimizations | rearranging elds in a stru ture dynami ally to optimize
a program's data-a ess patterns [Kistler 1999; Kistler and Franz 1999℄ | and a
dynami version of tra e s heduling, whi h optimizes based on information about
a program's ontrol ow during exe ution [Kistler 1999℄.
The ontinuous optimizer itself exe utes in the ba kground, as a separate lowpriority thread whi h exe utes only during a program's idle time [Kistler 1997;
Kistler 1999℄. Kistler uses a more sophisti ated metri than straightforward ounters to determine when to optimize, and observes that de iding what to optimize
is highly optimization-spe i [Kistler 1999℄.
2.9 Templates, ML, and C

ML and C make strange bedfellows, but the same approa h has been taken to
dynami ompilation in both. This approa h is alled \staged ompilation," where
ompilation of a single program is divided into two stages: stati and dynami
ompilation. Prior to run-time, a stati ompiler ompiles \templates," essentially
building blo ks whi h are pie ed together at run-time by the dynami ompiler,
whi h may also pla e run-time values into holes left in the templates. Typi ally
these templates are spe i ed by user annotations, although some work has been
done on deriving them automati ally [Mo k et al. 1999℄.
As just des ribed, template-based systems do not t our des ription of JIT ompilers, sin e there would appear to be no nontrivial translation aspe t. However,
templates may be en oded in a form whi h requires run-time translation before
exe ution, or the dynami ompiler may perform run-time optimizations after onne ting the templates.
Templates have been applied to (subsets of) ML [Leone and Lee 1994; Lee and
Leone 1996; Wi kline et al. 1998℄. They have also been used for run-time spe ialization of C [Consel and Noel 1996; Marlet et al. 1999℄, as well as dynami extensions
of C [Auslander et al. 1996; Engler et al. 1996; Poletto et al. 1997℄.
2.10 Simulation, Binary Translation, and Ma hine Code

Simulation is the pro ess of running native exe utable ma hine ode for one ar hite ture on another ar hite ture.10 How does this relate to JIT ompilation? One
9 Although,

making the general ase for run-time optimization, he dis usses intermodule optimizations where this is not the ase [Kistler 1997℄.
10 We use the term \simulate" in preferen e to \emulate" as the latter has the onnotation that
hardware is heavily involved in the pro ess. However, some literature uses the words inter hangeably.


J. Ay o k

of the te hniques for simulation is binary translation; in parti ular, we fo us on
dynami binary translation that involves translating from one ma hine ode to another at run-time. Typi ally, binary translators are highly spe ialized with respe t
to sour e and target; resear h on retargetable and \resour eable" binary translators
is still in its infan y [Ung and Cifuentes 2000℄. Altman, Kaeli, and She er [2000℄
has a good dis ussion of the hallenges involved in binary translation, and Cmelik
and Keppel [1994℄ ompares pre-1995 simulation systems in detail. Rather than
dupli ating their work, we will take a higher-level view.
May [1987℄ proposed that simulators ould be ategorized by their implementation te hnique into three generations. To this, we add a fourth generation to
hara terize more re ent work.
(1) First generation simulators were interpreters, whi h would simply interpret ea h
sour e instru tion as needed. As might be expe ted, these tended to exhibit
poor performan e due to interpretation overhead.
(2) Se ond generation simulators dynami ally translated sour e instru tions into
target instru tion one at a time, a hing the translations for later use.
(3) Third generation simulators improved upon the performan e of se ond generation simulators by dynami ally translating entire blo ks of sour e instru tions
at a time. This introdu es new questions as to what should be translated.
Most su h systems translated either basi blo ks of ode or extended basi
blo ks [Cmelik and Keppel 1994℄, re e ting the stati ontrol ow of the sour e
program. Other stati translation units are possible: one anomalous system,
DAISY, performed page-at-a-time translations from PowerPC to VLIW instru tions [Eb ioglu and Altman 1996; Eb ioglu and Altman 1997℄.
(4) What we all fourth generation simulators expand upon the third generation
by dynami ally translating paths, or tra es. A path re e ts the ontrol ow
exhibited by the sour e program at run-time, a dynami instead of a stati unit
of translation. The most re ent work on binary translation is on entrated on
this type of system.
Fourth generation simulators are predominant in re ent literature [Bala et al.
1999; Chen et al. 2000; Deaver et al. 1999; Gs hwind et al. 2000; Klaiber 2000;
Zheng and Thompson 2000℄. The stru ture of these is fairly similar:
(1) Pro led exe ution. The simulator's e ort should be on entrated on \hot" areas
of ode that are frequently exe uted. For example, initialization ode that is
exe uted only on e should not be translated or optimized. To determine whi h
exe ution paths are hot, the sour e program is exe uted in some manner and
pro le information is gathered. Time invested in doing this is assumed to be
re ouped eventually.
When sour e and target ar hite tures are dissimilar, or the sour e ar hite ture is un ompli ated (su h as a RISC pro essor) then interpretation of the
sour e program is typi ally employed to exe ute the sour e program [Bala et al.
1999; Gs hwind et al. 2000; Transmeta Corporation 2001; Zheng and Thompson 2000℄. The alternative approa h, dire t exe ution, is best summed up by
Rosenblum, Herrod, Wit hel, and Gupta [1995, page 36℄:

A Brief History of Just-In-Time 


By far the fastest simulator of the CPU, MMU, and memory system
of an SGI multipro essor is an SGI multipro essor.
In other words, when the sour e and target ar hite tures are the same, as in the
ase where the goal is dynami optimization of a sour e program, the sour e
program an be exe uted dire tly by the CPU. The simulator regains ontrol
periodi ally as a result of appropriately modifying the sour e program[Chen
et al. 2000℄ or by less dire t means su h as interrupts [Gorton 2001℄.
(2) Hot path dete tion. In lieu of hardware support, hot paths may be dete ted by
keeping ounters to re ord frequen y of exe ution [Zheng and Thompson 2000℄,
or by wat hing for ode that is stru turally likely to be hot, like the target of
a ba kwards bran h [Bala et al. 1999℄. With hardware support, the program's
program ounter an be sampled at intervals to dete t hot spots [Deaver et al.
Some other onsiderations are that paths may be strategi ally ex luded if they
are too expensive or diÆ ult to translate [Zheng and Thompson 2000℄, and
hoosing good stopping points for paths an be as important as hoosing good
starting points in terms of keeping a manageable number of tra es [Gs hwind
et al. 2000℄.
(3) Code generation and optimization. On e a hot path has been noted, the simulator will translate it into ode for the target ar hite ture, or perhaps optimize
the ode. The orre tness of the translation is always at issue, and some empiri al veri ation te hniques are dis ussed in [Zheng and Thompson 2000℄.
(4) \Bail-out" me hanism. In the ase of dynami optimization systems (where the
sour e and target ar hite tures are the same), there is the potential for a negative impa t on the sour e program's performan e. A bail-out me hanism [Bala
et al. 1999℄ heuristi ally tries to dete t su h a problem and revert ba k to the
sour e program's dire t exe ution; this an be spotted, for example, by monitoring the stability of the working set of paths. Su h a me hanism an also be
used to avoid handling ompli ated ases.
Another re urring theme in re ent binary translation work is the issue of hardware support for binary translation, espe ially for translating ode for lega y ar hite tures into VLIW ode. This has attra ted interest be ause VLIW ar hite tures promise lega y ar hite ture implementations whi h have higher performan e,
greater instru tion-level parallelism [Eb ioglu and Altman 1996; Eb ioglu and Altman 1997℄, higher lo k rates [Altman et al. 2000; Gs hwind et al. 2000℄, and lower
power requirements [Klaiber 2000℄. Binary translation work in these pro essors is
still done by software at run-time, and is thus still dynami binary translation, although o asionally pa kaged under more fan iful names to enrapture venture apitalists [Geppert and Perry 2000℄. The key idea in these systems is that, for eÆ ien y,
the target VLIW should provide a superset of the sour e ar hite ture [Eb ioglu and
Altman 1997℄; these extra resour es, unseen by the sour e program, an be used
by the binary translator for aggressive optimizations or to simulate troublesome
aspe ts of the sour e ar hite ture.


J. Ay o k

2.11 Java

Java is is implemented by stati ompilation to byte ode instru tions for the Java
virtual ma hine, or JVM. Early JVMs were only interpreters, resulting in less-thanstellar performan e:
Interpreting byte odes is slow. [Cramer et al. 1997, page 37℄
Java isn't just slow, it's really slow, surprisingly slow. [Tyma 1998, page
Regardless of how vitrioli the expression, the message was that Java programs
had to run faster, and the primary means looked to for a omplishing this was JIT
ompilation of Java byte odes. Indeed, Java brought the term \just-in-time" into
ommon use in omputing literature.11 Unquestionably, the pressure for fast Java
implementations spurred a renaissan e in JIT resear h; at no other time in history
has su h on entrated time and money been invested in it.
A early view of Java JIT ompilation is given by Cramer, Friedman, Miller, Seberger, Wilson, and Wol zko [1997℄, who were engineers at Sun Mi rosystems, the
progenitor of Java. They make the observation that there is an upper bound on
the speedup a hievable by JIT ompilation, noting that interpretation proper only
a ounted for 68% of exe ution time in a pro le they ran. They also advo ated
the dire t use of JVM byte odes, a sta k-based instru tion set, as an intermediate
representation for JIT ompilation and optimization. In retrospe t, this is a minority viewpoint; most later work, in luding Sun's own [Sun Mi rosystems 2001℄,
invariably began by onverting JVM ode into a register-based intermediate representation.
The interesting trend in Java JIT work [Adl-Tabatabai et al. 1998; Bik et al.
1999; Burke et al. 1999; Cierniak and Li 1997; Ishizaki et al. 1999; Krall and Gra
1997; Krall 1998; Yang et al. 1999℄ is the impli it assumption that mere translation
from byte ode to native ode is not enough: ode optimization is ne essary too.
At the same time, this work re ognizes that traditional optimization te hniques
are expensive, and looks for modi ations to optimization algorithms that strike a
balan e between speed of algorithm exe ution and speed of the resulting ode.
There have also been approa hes to Java JIT ompilation besides the usual
interpret- rst-optimize-later. A ompile-only strategy, with no interpreter whatsoever, was adopted by [Burke et al. 1999℄, who also implemented their system
in Java; improvements to their JIT dire tly bene ted their system. Agesen [1997℄
translated JVM byte odes into Self ode, to leverage optimizations already existing
in the Self ompiler. Annotations were tried by Azevedo, Ni olau, and Hummel
[1999℄ to shift the e ort of ode optimization prior to run-time: information needed
for eÆ ient JIT optimization was pre omputed and tagged on to byte ode as annotations, whi h were then used by the JIT system to assist its work. Finally, Plezbert
and Cytron [1997℄ proposed and evaluated the idea of \ ontinuous ompilation" for
Java in whi h an interpreter and ompiler would exe ute on urrently, preferably
11 Gosling [2001℄ points out that the term \just-in-time" is borrowed from manufa turing terminology, and tra es his own use of the term ba k to about 1993.

A Brief History of Just-In-Time 


on separate pro essors.12

In the ourse of surveying JIT work, some ommon attributes emerged. We propose
that JIT systems an be lassi ed a ording to three properties.
(1) Invo ation. A JIT ompiler is expli itly invoked if the user must take some
a tion to ause ompilation at run-time. An impli itly invoked JIT ompiler is
transparent to the user.
(2) Exe utability. JIT systems typi ally involve two languages: a sour e language
to translate from, and a target language to translate to (although these languages an be the same, if the JIT system is only performing optimization
on-the- y). We all a JIT system monoexe utable if it an only exe ute one
of these languages; polyexe utable if more than one an be exe uted. Polyexe utable JIT systems have the luxury of de iding when ompiler invo ation is
warranted, sin e either program representation an be used.
(3) Con urren y. This property hara terizes how the JIT ompiler exe utes, relative to the program itself. If program exe ution pauses under its own volition
to permit ompilation, it is not on urrent; the JIT ompiler in this ase may
be invoked via subroutine all, message transmission, or transfer of ontrol to a
oroutine. In ontrast, a on urrent JIT ompiler an operate as the program
exe utes on urrently: in a separate thread or pro ess, even on a di erent
pro essor.
JIT systems that fun tion in hard real-time may onstitute a fourth lassifying
property, but there seems to be little resear h in the area at present; it is un lear
if hard real-time onstraints pose any unique problems to JIT systems.

General, portable tools for JIT ompilation that help with the dynami generation
of binary ode did not appear until relatively re ently. To varying degrees, these
toolkits address three issues:
(1) Binary ode generation. As argued in [Ramsey and Fernandez 1995℄, and as
personal experien e attests, emitting binary ode su h as ma hine language is
a situation rife with opportunity for error. There are asso iated bookkeeping
tasks too: information may not yet be available upon initial ode generation,
like the lo ation of forward bran h targets. On e dis overed, the information
must be ba kpat hed into the appropriate lo ations.
(2) Ca he oheren e. CPU speed advan es have far outstripped memory speed
advan es in re ent years [Hennessy and Patterson 1996℄. To ompensate, modern CPUs in orporate a small, fast a he memory, the ontents of whi h may
get temporarily out of syn with main memory. When dynami ally generating ode, are must be taken to ensure that the a he ontents re e t ode
written to main memory before exe ution is attempted. The situation is even
12 Only

ompilation o urred on urrently, and only happened on e, as opposed to the ongoing
optimization of Kistler's \ ontinuous optimization" [Kistler 2001℄.


J. Ay o k

[Engler 1996℄
[Engler and Proebsting 1994℄
[Fraser and Proebsting 1999℄
[Keppel 1991℄
[Ramsey and Fernandez 1995℄

Code Generation    

Ca he Coheren e

Exe ution

Abstra t
Interfa e


ad ho
post x
ad ho

Table 1. Comparison of JIT toolkits.

more ompli ated when several CPUs share a single memory. Keppel ontains
a detailed dis ussion [Keppel 1991℄.
(3) Exe ution. The hardware or operating system may impose restri tions whi h
limit where exe utable ode may reside. For example, memory earmarked for
data may not allow exe ution (i.e., instru tion fet hes) by default, meaning
that ode ould be generated into the data memory, but not exe uted without
platform-spe i wrangling. Again, refer to Keppel [Keppel 1991℄.
Only the rst issue is relevant for JIT ompilation to interpreted virtual ma hine
ode | interpreters don't dire tly exe ute the ode they interpret | but there is
no reason why JIT ompilation tools annot be useful for generation of non-native
ode as well.
Table 1 gives a omparison of the toolkits. In addition to indi ating how well
the toolkits support the three areas above, we have added two extra ategories.
First, an \abstra t interfa e" is one that is ar hite ture-independent. Use of a
toolkit's abstra t interfa e implies that very little, if any, of the user's ode needs
modi ation in order to use a new platform. The drawba ks are that ar hite turedependent operations like register allo ation may be diÆ ult, and the mapping
from abstra t to a tual ma hine may be suboptimal, su h as a mapping from RISC
abstra tion to CISC ma hinery.
Se ond, \input" refers to the stru ture, if any, of the input expe ted by the
toolkit. With respe t to JIT ompilation, more ompli ated input stru tures take
more time and spa e for the user to produ e and the toolkit to onsume [Engler
Using a tool may solve some problems but introdu e others. Tools for binary
ode generation help avoid many errors ompared to manually emitting binary
ode. These tools, however, require detailed knowledge of binary instru tion formats whose spe i ation may itself be prone to error. Engler and Hsieh [2000℄
present a \metatool" that an automati ally derive these instru tion en odings by
repeatedly querying the existing system assembler with varying inputs.

Dynami , or just-in-time, ompilation is an old implementation te hnique with a
fragmented history. By olle ting this histori al information together, we hope to
shorten the voyage of redis overy.

A Brief History of Just-In-Time 


Thanks to Nigel Horspool, Shannon Jaeger, and Mike Zastre, who proofread and
ommented on drafts of this paper. Also, thanks to Ri k Gorton, James Gosling,
Thomas Kistler, Ralph Mauriello, and Jim Mit hell for supplying histori al information and lari ations. Evelyn Duesterwald's PLDI 2000 tutorial notes were
helpful in preparing Se tion 2.9.
Abrams, P. S. 1970.


An APL Ma hine. Ph. D. thesis, Stanford University. SLAC Report

Adl-Tabatabai, A.-R., Cierniak, M., Lueh, G.-Y., Parikh, V. M., and Sti hnoth, J. M.

1998. Fast, e e tive ode generation in a just-in-time Java ompiler. In PLDI '98 (1998),
pp. 280{290.
Agesen, O. 1996. Con rete Type Inferen e: Delivering Obje t-Oriented Appli ations. Ph.
D. thesis, Stanford University. Also as SMLI TR-96-52, Sun Mi rosystems, January 1996.
Agesen, O. 1997. Design and implementation of Pep, a Java just-in-time translator. Theory
and Pra ti e of Obje t Systems 3, 2, 127{155.
lzle, U. 1995. Type feedba k vs. on rete type inferen e: A omparison
Agesen, O. and Ho
of optimization te hniques for obje t-oriented languages. In OOPSLA '95 (1995), pp. 91{
Altman, E., Gs hwind, M., Sathaye, S., Kosono ky, S., Bright, A., Fritts, J., Ledak,
P., Appenzeller, D., Agri ola, C., and Filan, Z. 2000. BOA: The ar hite ture of a

binary translation pro essor. Te hni al Report RC 21665, IBM Resear h Division.
Wel ome to the opportunities of binary
translation. IEEE Computer 33, 3 (Mar h), 40{45.
Auslander, J., Philipose, M., Chambers, C., Eggers, S. J., and Bershad, B. N. 1996.
Fast, e e tive dynami ompilation. In PLDI '96 (1996), pp. 149{159.
Azevedo, A., Ni olau, A., and Hummel, J. 1999. Java annotation-aware just-in-time
(AJIT) ompilation system. In JAVA '99 (1999), pp. 142{151.
Bala, V., Duesterwald, E., and Banerjia, S. 1999. Transparent dynami optimization.
Te hni al Report HPL-1999-77, Hewlett-Pa kard.
Bartlett, J. 1992. Familiar Quotations (16th ed.). Little, Brown and Company. J. Kaplan,
Bentley, J. 1988. Little languages. In More Programming Pearls , pp. 83{100. AddisonWesley.
Bik, A. J. C., Girkar, M., and Haghighat, M. R. 1999. Experien es with Java JIT optimization. In Innovative Ar hite ture for Future Generation High-Performan e Pro essors
and Systems (1999), pp. 87{94. IEEE.
Brown, P. J. 1976. Throw-away ompiling. Software | Pra ti e and Experien e 6, 423{
Brown, P. J. 1990. Writing Intera tive Compilers and Interpreters. Wiley.
Altman, E. R., Kaeli, D., and Sheffer, Y. 2000.

Burke, M. G., Choi, J.-D., Fink, S., Grove, D., Hind, M., Sarkar, V., Serrano, M. J.,
Sreedhar, V. C., and Srinivasan, H. 1999. The Jalape~no dynami optimizing om-

piler for Java. In JAVA

'99 (1999), pp. 129{141.
The Design and Implementation of the Self Compiler, an Optimizing
Compiler for Obje t-Oriented Programming Languages. Ph. D. thesis, Stanford University.
Chambers, C. and Ungar, D. 1989. Customization: Optimizing ompiler te hnology for
Self, a dynami ally-typed obje t-oriented programming language. In PLDI '89 (1989), pp.
Chambers, C. 1992.


Chambers, C. and Ungar, D. 1990.

Iterative type analysis and extended message splitting:
Optimizing dynami ally-typed obje t-oriented programs. In PLDI '90 (1990), pp. 150{164.


J. Ay o k

Chambers, C. and Ungar, D. 1991.

OOPSLA '91 (1991), pp. 1{15.

Making pure obje t-oriented languages pra ti al. In

Chambers, C., Ungar, D., and Lee, E. 1989.

An eÆ ient implementation of Self, a
dynami ally-typed obje t-oriented programming language based on prototypes. In OOPSLA '89 (1989), pp. 49{70.
Chen, W.-K., Lerner, S., Chaiken, R., and Gillies, D. M. 2000. Mojo: A dynami
optimization system. In Third ACM Workshop on Feedba k-dire ted and Dynami Optimization (FDDO-3) (De ember 2000).
Cierniak, M. and Li, W. 1997. Briki: an optimizing Java ompiler. In Pro eedings IEEE
COMPCON '97 (1997), pp. 179{184.
Cmelik, B. and Keppel, D. 1994. Shade: A fast instru tion-set simulator for exe ution
pro ling. In Pro eedings of the 1994 Conferen e on Measurement and Modeling of Computer Systems (1994), pp. 128{137.
Consel, C. and Noel, F. 1996. A general approa h for run-time spe ialization and its
appli ation to C. In POPL '96 (1996), pp. 145{156.
Cramer, T., Friedman, R., Miller, T., Seberger, D., Wilson, R., and Wol zko, M.

Compiling Java just in time. IEEE Mi ro 17, 3 (May/June), 36{43.
A mixed ode approa h. The Computer Journal 16, 3,
Dawson, J. L. 1973. Combining interpretive ode with ma hine ode. The Computer Journal 16, 3, 216{219.
Deaver, D., Gorton, R., and Rubin, N. 1999. Wiggins/Redstone: An on-line program
spe ializer. In Pro eedings IEEE Hot Chips XI Conferen e (August 1999).
Deuts h, L. P. and S hiffman, A. M. 1984. EÆ ient implementation of the Smalltalk-80
system. In POPL '84 Pro eedings (1984), pp. 297{302.
lzle, U. 1997. The spa e overhead of ustomization. Te hni al
Die kmann, S. and Ho
Report TRCS 97-21 (De ember), University of California at Santa Barbara. 
lu, K. and Altman, E. R. 1996. DAISY: Dynami ompilation for 100% ar hite Eb iog
tural ompatibility. Te hni al Report RC 20538, IBM Resear h Division. 
lu, K. and Altman, E. R. 1997. Daisy: Dynami ompilation for 100% ar hite Eb iog
tural ompatibility. In ISCA '97 (1997), pp. 26{37.
Engler, D. R. 1996. VCODE: A retargetable, extensible, very fast dynami ode generation
system. In PLDI '96 (1996), pp. 160{170.
Engler, D. R. and Hsieh, W. C. 2000. DERIVE: A tool that automati ally reverseengineers instru tion en odings. In ACM SIGPLAN Workshop on Dynami and Adaptive
Compilation and Optimization (Dynamo '00) (2000), pp. 12{22.
Engler, D. R., Hsieh, W. C., and Kaashoek, M. F. 1996. `C: A language for high-level,
eÆ ient, and ma hine-independent dynami ode generation. In POPL '96 (1996), pp.
Engler, D. R. and Proebsting, T. A. 1994. DCG: An eÆ ient, retargetable dynami
ode generation system. In ASPLOS VI (1994), pp. 263{272.
Franz, M. 1994. Code-Generation On-the-Fly: A Key to Portable Software. Ph. D. thesis,
ETH Zuri h.
Franz, M. and Kistler, T. 1997. Slim binaries. CACM 40, 12 (De ember), 87{94.
Fraser, C. W. and Proebsting, T. A. 1999. Finite-state ode generation. In PLDI '99
(1999), pp. 270{280.
Geppert, L. and Perry, T. S. 2000. Transmeta's magi show. IEEE Spe trum 37, 5 (May),
Goldberg, A. and Robson, D. 1985. Smalltalk-80: The Language and its Implementation.
Gorton, R. 2001. Private ommuni ation.
Gosling, J. 2001. Private ommuni ation.

Dakin, R. J. and Poole, P. C. 1973.

A Brief History of Just-In-Time 


Gs hwind, M., Altman, E. R., Sathaye, S., Ledak, P., and Appenzeller, D. 2000.

Dynami and transparent binary translation. IEEE Computer 33, 3, 54{59.
Hammond, J. 1977. BASIC | an evaluation of pro essing methods and a study of some
programs. Software | Pra ti e and Experien e 7, 697{711.
Hansen, G. J. 1974. Adaptive Systems for the Dynami Run-Time Optimization of Programs. Ph. D. thesis, Carnegie-Mellon University.
Hennessy, J. L. and Patterson, D. A. 1996. Computer Ar hite ture: A Quantitative
Approa h (Se ond ed.). Morgan Kaufmann.
lzle, U. 1994. Adaptive Optimization for Self: Re on iling High Performan e with ExHo
ploratory Programming. Ph. D. thesis, Carnegie-Mellon University.
lzle, U. and Ungar, D. 1994a. Optimizing dynami ally-dispat hed alls with run-time
type feedba k. In PLDI '94 (1994), pp. 326{336.
lzle, U. and Ungar, D. 1994b. A third-generation Self implementation: Re on iling
responsiveness with performan e. In OOPSLA '94 (1994), pp. 229{243.
Ishizaki, K., Kawahito, M., Yasue, T., Takeu hi, M., Ogasawara, T., Suganuma, T.,
Onodera, T., Komatsu, H., and Nakatani, T. 1999. Design, implementation, and

evaluation of optimizations in a just-in-time ompiler. In JAVA '99 (1999), pp. 119{128.
The dynami in remental ompiler of APLn3000. In APL '79 Conferen e Pro eedings , Volume 9 of APL Quote Quad (June 1977), pp. 82{87. Number 4,
Part 1.
Jones, N. D., Gomard, C. K., and Sestoft, P. 1993. Partial Evaluation and Automati
Program Generation. Prenti e Hall.
Keppel, D. 1991. A portable interfa e for on-the- y instru tion spa e modi ation. In
ASPLOS IV (1991), pp. 86{95.
Keppel, D., Eggers, S. J., and Henry, R. R. 1991. A ase for runtime ode generation.
Te hni al Report 91-11-04, University of Washington Department of Computer S ien e and
Kistler, T. 1997. Dynami runtime optimization. In Pro eedings of the Joint Modular
Languages Conferen e (JMLC '97) (1997), pp. 53{66.
Kistler, T. 1999. Continuous Program Optimization. Ph. D. thesis, University of California, Irvine.
Kistler, T. 2001. Private ommuni ation.
Kistler, T. and Franz, M. 1999. The ase for dynami optimization: Improving memoryhierar hy performan e by ontinuously adapting the internal storage layout of heap obje ts at run-time. Te hni al Report 99-21 (May), University of California, Irvine. Revised
September, 1999.
Klaiber, A. 2000. The te hnology behind Crusoe pro essors. Te hni al report (January),
Transmeta Corporation.
Knuth, D. E. 1971. An empiri al study of Fortran programs. Software | Pra ti e and
Experien e 1, 105{133.
Krall, A. 1998. EÆ ient JavaVM just-in-time ompilation. In Pro eedings of the 1998
Johnston, R. L. 1977.

International Conferen e on Parallel Ar hite tures and Compilation Te hniques (PACT
'98) (1998), pp. 205{212.
Krall, A. and Grafl, R. 1997. A Java just-in-time ompiler that trans ends JavaVM's
32 bit barrier. In PPoPP '97 Workshop on Java for S ien e and Engineering (1997).
Lee, P. and Leone, M. 1996. Optimizing ML with run-time ode generation. In PLDI '96
(1996), pp. 137{148. 

lu, K., and Altman, E.
Lee, S., Yang, B.-S., Kim, S., Park, S., Moon, S.-M., Eb iog

2000. EÆ ient Java ex eption handling in just-in-time ompilation. In Java 2000 (2000),
pp. 1{8.
Leone, M. and Lee, P. 1994. Lightweight run-time ode generation. In Pro eedings of the

ACM SIGPLAN Workshop on Partial Evaluation and Semanti s-Based Program Manipulation (1994), pp. 97{106.


J. Ay o k

Marlet, R., Consel, C., and Boinot, P. 1999.

EÆ ient in remental run-time spe ialization for free. In PLDI '99 (1999), pp. 281{292.
Mauriello, R. 2000. Private ommuni ation.
May, C. 1987. Mimi : A fast System/370 simulator. In Pro eedings of the SIGPLAN '87
Symposium on Interpreters and Interpretive Te hniques (June 1987), pp. 1{13.
M Carthy, J. 1960. Re ursive fun tions of symboli expressions and their omputation by
ma hine, part I. Commun. ACM 3, 4, 184{195.
M Carthy, J. 1981. History of LISP. In R. L. Wexelblat Ed., History of Programming
Languages , pp. 173{185. A ademi Press.
Miller, T. C. 1977. Tentative ompilation: A design for an APL ompiler. In APL '79
Conferen e Pro eedings , Volume 9 of APL Quote Quad (June 1977), pp. 88{95. Number
4, Part 1.
Mit hell, J. G. 1970. The Design and Constru tion of Flexible and EÆ ient Intera tive
Programming Systems. Ph. D. thesis, Carnegie-Mellon University.
Mit hell, J. G. 2000. Private ommuni ation.
Mit hell, J. G., Perlis, A. J., and van Zoeren, H. R. 1968. LC2 : A language for
onversational omputing. In M. Klerer and J. Reinfelds Eds., Intera tive Systems for
Experimental Applied Mathemati s . A ademi Press. Pro eedings of 1967 ACM Symposium.
Mo k, M., Berryman, M., Chambers, C., and Eggers, S. J. 1999. Calpa: A tool for
automating dynami ompilation. In Se ond ACM Workshop on Feedba k-dire ted and
Dynami Optimization (1999), pp. 100{109.
Ng, T. S. and Cantoni, A. 1976. Run time intera tion with FORTRAN using mixed ode.
The Computer Journal 19, 1, 91{92.
Pittman, T. 1987. Two-level hybrid interpreter/native ode exe ution for ombined spa etime program eÆ ien y. In SIGPLAN Symposium on Interpreters and Interpretive Te hniques (1987), pp. 150{152.
Plezbert, M. P. and Cytron, R. K. 1997. Does \just in time" = \better late then never"?
In POPL '97 (1997), pp. 120{131.
Poletto, M., Engler, D. R., and Kaashoek, M. F. 1997. t : A system for fast, exible,
and high-level dynami ode generation. In PLDI '97 (1997), pp. 109{121. 
ndez, M. 1995. The New Jersey ma hine- ode toolkit. In Pro eedRamsey, N. and Ferna
ings of the 1995 USENIX Te hni al Conferen e (1995), pp. 289{302.
Rau, B. R. 1978. Levels of representation of programs and the ar hite ture of universal host
ma hines. In Pro eedings of the 11th Annual Mi roprogramming Workshop (MICRO-11)
(1978), pp. 67{79.
Rosenblum, M., Herrod, S. A., Wit hel, E., and Gupta, A. 1995. Complete omputer
system simulation: The SimOS approa h. IEEE Parallel and Distributed Te hnology 3, 4
(Winter), 34{43.
S hroeder, S. C. and Vaughn, L. E. 1973. A high order language optimal exe ution
pro essor: F ast I ntent Re ognition S yst em (FIRST). In Pro eedings of a Symposium on
High-Level-Language Computer Ar hite ture , Volume 8 of SIGPLAN (November 1973),
pp. 109{116. Number 11.
Sebesta, R. W. 1999. Con epts of Programming Languages (Fourth ed.). Addison-Wesley.
Smith, R. B. and Ungar, D. 1995. Programming as an experien e: The inspiration for
Self. In ECOOP '95 (1995).
Sun Mi rosystems. 2001. The Java HotSpot virtual ma hine. White paper.
Thompson, K. 1968. Regular expression sear h algorithm. Commun. ACM 11, 6 (June),
Transmeta Corporation. 2001. Code morphing software. http://www.transmeta. om/te hnology/ar hite ture/ ode morphing.html.
Tyma, P. 1998. Why are we using Java again? Commun. ACM 41, 6, 38{42.
Ung, D. and Cifuentes, C. 2000. Ma hine-adaptable dynami binary translation. In Dynamo '00 (2000), pp. 41{51.

A Brief History of Just-In-Time

Ungar, D. and Smith, R. B. 1987.

pp. 227{242. 


Self: The power of simpli ity. In OOPSLA '87 (1987),

lzle, U. 1992.
Ungar, D., Smith, R. B., Chambers, C., and Ho

Obje t, message, and
performan e: How they oexist in Self. IEEE Computer 25, 10 (O tober), 53{64.
University of Mi higan. 1966a. The System Loader. In University of Mi higan Exe utive
System for the IBM 7090 Computer , Volume 1.
University of Mi higan. 1966b.
The \University of Mi higan Assembly Program"
(\UMAP"). In University of Mi higan Exe utive System for the IBM 7090 Computer ,
Volume 2.
van Dyke, E. J. 1977. A dynami in remental ompiler for an interpretive language.
Hewlett-Pa kard Journal 28, 11 (July), 17{24.
Wi kline, P., Lee, P., and Pfenning, F. 1998. Run-time ode generation and Modal-ML.
In PLDI '98 (1998), pp. 224{235.
Wirth, N. and Gutkne ht, J. 1989. The Oberon system. Software | Pra ti e and Experien e 19, 9 (September), 857{893.
Yang, B.-S., Moon, S.-M., Park, S., Lee, J., Lee, S., Park, J., Chung, Y. C., Kim, S., 
lu, K., and Altman, E. 1999. LaTTe: A Java VM just-in-time ompiler with
Eb iog

fast and eÆ ient register allo ation. In International Conferen e on Parallel Ar hite tures
and Compilation Te hniques (1999), pp. 128{138. IEEE.
Zheng, C. and Thompson, C. 2000. PA-RISC to IA-64: Transparent exe ution, no re ompilation. IEEE Computer 33, 3 (Mar h), 47{52.