Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1


|Views: 24|Likes:
Published by firefoxextslurper

More info:

Published by: firefoxextslurper on Mar 25, 2008
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Efficient and General On-Stack Replacementfor Aggressive Program Specialization
Sunil Soman Chandra KrintzComputer Science DepartmentUniversity of California, Santa Barbara, CA 93106E-mail:
 Efficient invalidation and dynamic replacement of executing code – on-stack replacement (OSR), is neces-sary to facilitate effective, aggressive, specialization of object-oriented programs that are dynamically loaded,incrementally compiled, and garbage collected. Extant OSR mechanisms restrict the performance potential of  program specialization since their implementations arespecial-purpose and restrict compiler optimization. In this paper, we present a novel, general-purposeOSR mechanism that is more amenable to optimiza-tion than prior approaches. In particular, we decou- ple the OSR implementation from the optimization pro-cess and update the program state information incre-mentally during optimization. Our OSR implementa-tion efficiently enables the use of code specializationsthat are invalidated by any event – including those ex-ternal to program code execution. We improve codequality over the extant, state-of-the-art, resulting in per- formance gains of 1-31%, and 9% on average.
Program specialization, on-stack replacement, code in-validation, virtual machines, Java.
1 Introduction
Advanced implementations of modern object-oriented languages, such as Java [12], depend on run-time environments that employ “just-in-time” compi-lation for performance. These language environmentsdynamically compile program code that is loaded incre-mentally, and possibly from a remote target. Althoughoptimization has the potential to significantly improveprogram performance, on-the-fly and incremental com-pilation makes optimization implementation challeng-ing. In particular, the compiler must predict which op-timizations will be most successful for the remainder of program execution.One successful approach to enable more aggressiveoptimizations for such systems is to
the pro-gram for a particular behavior or set of execution con-ditions, and then to
the optimization when condi-tions change. This enables the compiler to identify op-timizations that are likely to be successful in the near-term, and then to apply others as it learns more aboutfuture program and resource behavior. Examples of such optimizations include virtual call inlining, excep-tion handler removal, and memory management systemoptimizations.Key to the success of such specializations is thepresence of an efficient, general-purpose, mechanismfor undoing optimization
when associated as-sumptions are rendered invalid. Future method invo-cations can be handled by recompilation. Replacingthe currently executing method and re-initiating exe-cution in the new version is termed
on-stack replace-ment 
(OSR). OSR has previously been successfully em-ployed for debugging optimized code [14], for deferredcompilation [7, 23, 10, 19], and method optimization-level promotion [15, 10, 19].Each of these prior approaches, however, is specificto the particular optimization being performed. More-over, extant approaches inhibit compiler optimizationsby employing special instructions for state collection,in-line with generated program code. Such imple-mentations, in addition to increasing code size, increasevariable live ranges and restrict code motion.We present a general-purpose OSR implementation,which can be used for any OSR-based specialization,and is more amenable to compiler optimization. We de-couple OSR from the program code, and consequently,from compiler optimizations. This significantly im-proves code quality over an extant, state-of-the-art ap-proach, and enables OSR to occur at any point at whichcontrol can transfer out of an executing method. Wecan, therefore, support existing OSR-based specializa-tions as well as other, more aggressive, optimizationsthat are triggered by events
to the executingcode, including, class loading, changes in program andJVM behavior, exception conditions, resource availabil-ity, and user events, such as dynamic software updates.
In the sections that follow, we provide backgroundon OSR (Section 2) and detail our extensions and im-plementation in the Jikes Research Virtual Machine(JikesRVM) [1] (Section 3). We then describe threeOSR-based specializations (Section 4). The first twoare existing specializations for virtual method dispatchand for dynamic GC switching. Our OSR implemen-tation enables performance improvement by 6% for theformer technique, and by 9% for the latter. We alsopresent a novel specialization for generational garbagecollection in which we avoid adding write-barriers togenerated code until they are required (i.e. until thereare objects in the old generation(s)). Our results indi-cate that we improve startup time for programs by 6%on average; for short running programs we reduce ex-ecution time by 8-14%. Following our empirical eval-uation and analysis (Section 5), we present our conclu-sions and plans for future work (Section 6).
2 Background and Related Work
On-stack replacement (OSR) [7, 15] is a four stepprocess. The runtime extracts the execution state (cur-rent variable values and program counter) from a partic-ular method. The compilation system then recompilesthe method. Next, the compiler generates a stack activa-tion frame and updates it with the values extracted fromthe previous version. Finally, the system replaces theold activation frame with the new and restarts executionin the new version.Given OSR functionality, a dynamic compilationsystem can implement aggressive specializations basedon conditions that may change as execution progresses.If these conditions change as a result of an external ora runtime event, the compilation system can recompile(andpossiblyre-optimize)thecode, andreplacethecur-rently executing version [7]. OSR has been employedby extant virtual execution environments for dynami-cally de-optimizing code for debugging [14], for de-ferred compilation of method regions to avoid compi-lation overhead and improve data flow [7, 23, 10, 19],and to optimize methods that execute unoptimized for along time without returning [15, 10, 19].These OSR implementations were specially de-signed for a target purpose. For example, OSR fordeferred compilation is implemented as an uncondi-tional branch to the OSR invocation routine. OSR formethod promotion and de-optimization is implementedas part of the thread yield process. The JikesRVM fromIBM Research is a virtual machine that currently em-ploys a state-of-the-art, special-purpose, OSR for de-ferred compilation and method promotion [10]. It is thissystem that we extend in this work.The JikesRVM optimizing compiler inserts an in-struction, called an OSRPoint, into the application codeat the point at which OSR should be unconditionallyperformed. This instruction is implemented similarly toa call instruction – execution of an OSRPoint insertedin a method causes OSR to be performed
for that method. The OSRPoint records necessaryexecution state which consists of values for bytecode-level, local and stack variables, and the current programcounter. The execution state is a mapping that providesinformation about runtime values at the bytecode-levelso that the method can be recompiled and executionrestarted with another version. The JikesRVM OSR im-plementation is similar to that used in other systems forsimilar purposes [14, 23, 13, 19], although the authorsof each refer to the instruction using different names,e.g., interrupt points [14], OPC RECOMPILE instruc-tions [23], and uncommon traps [19].Prior OSR implementations restrict compiler opti-mization. Since the compiler considers all methodvariables (locals and stack) live at an OSRPoint, itartificially extends the live ranges of variables andlimits the applicability of optimizations such as deadcode elimination, load/store elimination, alias analy-sis, and copy/constant propagation. In addition, thecompiler cannot move variable definitions around OS-RPoints [14, 10].
3 Extending On-Stack Replacement
Existing implementations for OSR do not signifi-cantlydegradecodequalitywhenthereareasmallnum-ber of OSRPoints [10, 14]. However, our goal is toenable more aggressive, existing and novel, specializa-tions including those triggered by events external to theexecuting code, e.g., class loading, exception handling,garbage collection optimization, and dynamic softwareupdating. For such specializations, we may be requiredtoperform OSR at all locations ina method at which ex-ecution can be suspended. Since there are a large num-ber of such locations, many along the critical path of theprogram, we require an alternative implementation thatenables optimization at, and across, these locations.To this end, we extended the current OSR imple-mentation in the JikesRVM with a mechanism that de-couples the OSR implementation from the optimizationprocess. In particular, we maintain execution state in-formation without inserting extra instructions at everypoint at which control may transfer out of a method.This includes implicit yield points (method prologues,method epilogues, and loop back-edges), call sites, ex-ception throws, and explicit yield points. The datastructure we employ for state collection is called aVARMAP (short for
variable map
).A VARMAP is a per-method list of thread-switchpoints and bytecode variables that are live at each point.We maintain this list independent from compiled codeso it does not affect liveness of code variables. We up-date the VARMAP incrementally as the compiler per-forms its optimizations. Our VARMAP is somewhatsimilar in form to the data structure described in [9]
..15: int_move
=l8i(int)18: int_shll17i(int)=
, 220: call static “callme() V”25: int_addl19i(int) = l8i(int),
....15: int_movel15i(int)=l8i(int)18: int_shll17i(int)=l15i(int), 220: call static “callme() V”25: int_addl19i(int)= l8i(int),l15i(int)..
After optimizationBefore optimizationIntermediate Code (HIR)
25@main (..LLL,..),.., l18i(int),
, l17i(int), ..25@main (..LLL,..),.., l18i(int),l15i(int), l17i(int), ..
transferVarForOsr(l15i, l8i)
VARMAP entry
Figure 1. Shows how the VARMAP ismaintained and updated.
for tracking pointer updates in the presence of com-piler optimizations, to support garbage collection inModula-3. Our implementation is different in that wetrack all stack, local, and temporary variables across awide range of compiler optimizations automatically andtransparently, and do so online, during dynamic opti-mization of Java programs.To update VARMAP entries during optimization, wedefined the following system methods:
transferVarForOsr(var1, var2)
: Record that
will be used in place of 
, henceforth in thecode (e.g. as a result of copy propagation)
removeVarForOsr (var)
: Record that
is nolonger live/valid in the code.
replaceVarWithExpression(var, vars[], oper-ators[])
: Record that variable
has beenreplaced by an expression that is derivable fromthe set of variables
.Our OSR-enabled compilation system handles
JikesRVM optimizations that impact liveness excepttail call and tail recursion elimination (which elim-inate stack frames entirely). This set of optimiza-tions includes copy and constant propagation, com-mon subexpression elimination (CSE), branch opti-mizations, dead-code elimination (DCE), and local es-cape analysis optimization.When a variable is updated, the compiler calls awrapper function that automatically invokes the ap-propriate VARMAP functions. This enables users toeasily extend the compiler with new optimizations,without having to manually handle VARMAP up-dates. For example, when copy/constant propagationor CSE replaces a use of a variable (
) with an-other variable or constant, the wrapper performs thereplacement in the VARMAP entry by invoking the
function.We handle DCE and local escape analysis using asimilar wrapper function for updates to variable def-initions. When definitions of dead variables that arepresent in the VARMAP are removed during optimiza-tion, the wrapper replaces its entry with the
inthe instruction, or records all of the right-hand vari-ables along with operators used to derive the
incase of multiple
(we currently only handle sim-ple unary or binary expressions). Similarly, we replacevariables eliminated by escape analysis with their
in the VARMAP.The compiler automatically updates the VARMAPduring live variable analysis. We record variables thatare no longer live at an OSR point, and the relativeposition of each in the map. Every variable that liveanalysis discovers as dead, is set to a
type in theVARMAP. We cannot simply drop the variable from theentry since we must track the relative positions of vari-ables in order to enable their restoration in the correctlocation during OSR. During register allocation, we up-date the VARMAP with physical register and spill loca-tions. This enables us to restore these locations duringOSR as well.Figure 1 shows how we maintain and updateVARMAP entries. We show the VARMAP before andafter copy propagation. We show a snippet of Java byte-code(left), andhigh-level, JikesRVMintermediatecoderepresentation (HIR).We also show the VARMAP entry(bottom) for the
call site which contains thebytecode index (25) of the instruction that follows thecall site, as well as three local (l) variables with inte-ger (i) types (
a: l8i, b: l15i, c: l17i
).The index identifies the place in the code at which ex-ecution resumes following the call (in this example), if OSR occurs during
. The VARMAP tracksmethod variables (
l8i, l15i, and l17i
) at thispoint; this entry is used by the system to update theframe of the new version of 
during OSR.In the HIR before optimization, the instructionat bytecode index 15 copies variable
then uses both in the subsequent code. The op-timization replaces all
uses with
dur-ing copy propagation. During this optimization,the replacement invokes a wrapper which automati-cally updates the VARMAP, i.e., the wrapper calls
prior to re-placement. DCE will remove bytecode instruction 15in a later pass. DCE of the instruction at index 15 willcause the VARMAP to record that
is no longerlive.When the compilation of a method completes, weencode the VARMAP using a compact encoding forOSRPoints from the existing implementation [10]. Theencoded map contains an entry for each OSR point,which consists of the
register map
– a bitmap that indi-cateswhichphysicalregisterscontainreferences(whichthe garbage collector may update). In addition, the mapcontains the current program counter (bytecode index),and a list of pairs
(local variable, location)
(each pair

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->