Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more ➡
Standard view
Full view
of .
Add note
Save to My Library
Sync to mobile
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
A Comparison of Virtual Machines

A Comparison of Virtual Machines



|Views: 3,281|Likes:
Published by Computer Guru

More info:

Published by: Computer Guru on Sep 28, 2008
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See More
See less





Stacking them up: a Comparison of Virtual Machines
K John Gough
 A popular trend in current software technology is togain program portability by compiling programs to an in-termediate form based on an abstract machine definition.Such approaches date back at least to the 1970s, but haveachieved new impetus based on the current popularity of the programming language Java. Implementations of lan-guage Java compile programs to bytecodes understood bythe Java Virtual Machine (JVM). More recently Microsoft have released preliminary details of their “.NET” platform,which is based on an abstract machine superficially similar tothe JVM. Ineachcase programexecutionis normallyme-diatedby a just in time compiler(JIT),althoughin principleinterpretative execution is also possible. Although these two competing technologies share somecommon aims the objectives of the virtual machine designsare significantlydifferent. In particular,the ease with whichembedded systems might use small-footprint versions of thesevirtual machinesdependsondetailedproperties of themachine definitions. In this study, a compiler was implemented which can produce output code that may be run on either the JVM or .NET platforms. The compiler is available in the pub-lic domain, and facilitates comparisons to be made both at compile time and at runtime.
1 Introduction
1.1 Abstract Stack Machines
The idea of using an intermediate form within a pro-gramming language compiler, as a means of communica-tion between the front-endand back-end,dates back at leastto the 1970s. The idea is quite straightforward. Languagedependent front-ends compile and semantically check pro-grams, passing an intermediate language representation of the program to the code-generating backend. In an idealsituation the frontend would be entirely independent of the
Queensland University of Technology, Box 2434 Brisbane 4001, Aus-tralia
target hardware, while the backend would be sensibly inde-pendent of the particular language in which the source pro-gram was written. In this way the task of writing compilersfor
languages on
machine architectures is factoredinto
part-compilers rather than
completecompilers.Many of these intermediate language representationwere based on abstract stack machines. One particular rep-resentation,
, was invented as an intermediate formfor the ETH Pascal Compilers[1], but became pervasive asthe machine code for the UCSD Pascal System. What hadbeen noted by the UCSD people was that a program en-coded for an abstract stack machine may be used in twoways: a compiler backend may compile the code down tothe machine language of the actual target machine, or aninterpreter may be written which
the abstract ma-chine on the target. This interpretative approach surrendersa significant factor of speed, but has the advantage that pro-grams are much more dense in the abstract machine encod-ing. In the case of UCSD Pascal the code was so compactthat the compilers could be run on the 4k or so of mem-ory available on the very first microcomputers. As a con-sequence of this technology high-level languages becameavailable for the first time on microcomputers. As an addedbenefit, the task of porting a language system to a new ma-chine reducedto the relatively simple task of creating a newinterpreter on the new machine.The use of abstract machines as compiler intermediateforms has also had its adherents. For example, the Gar-dens Point compilers all use a stack intermediate form (D-Code) for all of the languages and platforms supported bythe system[2]. Although most implementations are fullycompiled, a special lightweight interpreted version of thesystem was written in about 1990 for the Intel
archi-tecture, allowing users with a humble IBM XT to producethe same results as the 32-bit UNIX platforms that the otherimplementations supported[3]. As a measure of the com-plexity of the virtual machine emulator, the interpreter wasabout 1k lines of assembly language, with the floating pointemulator a further 1k lines.A largely failed attempt to leverage the portability prop-erties of stack intermediate forms was the Open Soft-1
ware Foundation’s
Achitecture Neutral Distribution Form
).TheideabehindANDF was to distributeprogramsin an intermediate form, and complete the task of compi-lation during an
step. The ANDF form wascode for an abstract stack machine, but one with a slighttwist. Generators of intermediate forms such as D-Codeknow enough about the target’s addressing constraints to beable to resolve (say) record field accesses to address offsets.In the case of ANDF the target is not yet determined at thetime of compilation, so that all such accesses must remainsymbolic. It has been suggested that this incorporation of symbolic information into the distributed form was consid-ered to be a threat to intellectual property rights by the ma- jor software companies, and was a factor in the failure of the form to achieve widespread adoption.In the late 1990s Sun Microsystems released theirJava[4] language system. This system is, once again, basedon an abstract stack machine. And again, like ANDF, re-lies on the presence of symbolic information to allow suchthings as field offsets to be resolved at deployment time.In the case of Java and the Java Virtual Machine[5] (
)the “problem”of symbolic content turnedout to be a virtue.Thepresenceofthesymbolicinformationis thethingwhichallows deployment-time and runtime enforcement of thetype system via the so-called
bytecode verifier 
. These run-time type safety guarantees are the basis on which appletsecurity is founded. As things now stand,
s are avail-able for almost all computing platforms, and Java tells aprogram portability story which transends almost all othervehicles.In mid-2000 Microsoft revealed a new technology basedon a wider use of the world wide web for service delivery.This technology became known as the
system. Thetechnology has many components, but all of it depends ona runtime which is object-oriented and fully garbage col-lected. The runtime processes an intermediate form which,like the
, is based on an abstract stack machine. Apartfrom this common structure, the detailed design of the twomachines is quite different.During 1999 the author had explored the applicability of the
as a target for languages other than Java. As aresult of this a prototypecompiler for the language Compo-nent Pascal[6] was written. This compiler translates Com-ponentPascal programsinto
bytecodes. The prototypewas written in Java. During the first half of 2000 Paul Roeand the author were given the opportunity to work underNDA on the then un-announced
platform. Buildingon the experience of the prototype, an entirely new Com-ponent Pascal compiler was written, this time implementedin Component Pascal. The new compiler has two separatecode emitters. One produces
byte-codes, while theother produces
intermediate language. The compilermay be bootstrapped on either platform. The existence of these two parallel code generators allows side by side com-parisons to be made between the two platforms.The remainderof this paperis organisedas follows: Sec-tion 2 gives a brief overview of the Java Virtual Machine,while Section 3 gives an overview of the
executionengine. Section 4 discusses the detailed differences be-tween the two abstract machines, and introduces some per-formancecomparisions. Finally, Section 5 drawssome con-clusions and offers some tentative predictions of future di-rections.
2 The Java Virtual Machine
The underlying execution mechanism of the
is anevaluation stack, and a set of instruction which manipulatethis stack. As a first example, the code required to take twolocal integer variables, add them together and deposit theresult in a third local variable would be –
; push local int variable 1
; push local int variable 2
; add the two top elements
; pop result into variable 3
Note the use of the
-prefix on all of these instructions,encoding the fact that these all operate on integers. Noticealso that in this case the index of the local variable is en-coded into the instruction, at least for the lowest numberedfew variables. We might therefore expect the code to bequite dense with each of the instructions requiring only onebyte.The instruction set of the
is designed with the solepurpose of representing Java programs. There is thus directsupport for the object model of Java, and for the variouskindsof methoddispatchthat are required. In particular,theinstruction set allows for classes to inherit behaviour from just one superclass, but to declare that they implement mul-tiple fully abstract class specifications (i.e. “interfaces”).At runtime, data is represented in just two ways. Scalardata may exist as local variables, in fields of structures, oron the evaluation stack of the abstract machine. Aggregatedata exists only in dynamically allocated objects which areautomatically collected when they are no longer accessible.References to these objects maybe storedin local variables,in fields or on the evaluation stack as with other scalars.There is no union construct.During the execution of a method, the evaluation stack consists of a finite stack of “slots” the depth of which isstatically determined by the compiler. Each of these slotsmay contain an object reference or a 32-bit scalar value.Long integers and floating point double values use up twoslots.2
2.1 Class Files
At deployment time a Java program is represented by aset of one or more dynamically loaded
class files
. Thesefiles contain a specification of the behaviour of the class,including its external contracts. The features of the classare named in an indexed
constant pool
, and all referencesto these features are mediated via references to the constantpool indices.This symbolic information allows for a significant de-gree of type-checking to take place at load time, with asmall amount of runtime checking still required. This, to-gether with the absence of instructions which manipulateaddresses ensures that programs encoded for the
willbe free from certain kinds of type errors at runtime. This isa necessary foundation for the kind of security which usersrequire before executing code from untrusted sources.As it turns out, the presence of symbolic information in-creases the code density of programs so that typically theyarecomparablein size with nativecode objectfiles forcom-plete programs. Although each instruction requires onlyone byte, or one byte plus a constant pool index, the con-stant pool itself takes up a significant amount of space. Of course, class files tend to be textually repetitious, so thatthey compress quite readily for data transport.
2.2 Parameter Passing
There are four different method invocation instructions,for static methods, virtual methods, interface methods, andvirtual methods invoked statically. In each case the methodmay take any number of parameters, passed by value. In allbut the static case, the methods take a
receiver, whichappearsas the zero-thparameterto thecallee. Methodsmayreturn a single result.In all cases, actual parameter values are pushed onto theevaluation stack prior to the call, and a returned result ap-pears on the top of the evaluation stack on the return. In-coming values appear as the first
local variables in thecallee.Since only scalar values and references may be pushedon the stack, it follows that these are the only possible pa-rameter types. However, since both arrays and structuresonly exist as dynamically allocated objects accessed by ref-erence, this is no limitation for Java programs.As has been noted elsewhere[7, 8] the semantics of pa-rameter passing in the
create a limitation in the effi-ciency with which languages other than Java can be imple-mented on this machine.
3 The
Execution Engine
The underlyingexecutionmechanismof 
is an eval-uation stack, and a set of instruction which manipulate thisstack. To take the same example, the code required to taketwo local integer variables, add them and deposit the resultin a third local variable would be –
; push local variable 1
; push local variable 2
; add the two top elements
; pop result into variable 3
These instructions are all generic, with the type of the“add” being determined from the inferred type of the stack contents. In this case the type will be known from the de-clared types of the local variables. Indeed, the instructionsequence would be identical if the three variables were alldeclared as floating point double type. Notice that for thisplatformalso, the indexof the local variableis encodedintothe instruction, at least for the lowest numbered variables.Theinstructionset of 
is designedwiththe objectiveof supporting multiple languages, and thus needs to supportall of the constructs of what Microsoft calls the
Virtual Ob- ject System
). The object model supports several kindsofmethoddispatch. Therearethreekindsofmethods: staticmethods and instance methods which may be either vir-tual or non-virtual. As with Java,
reference classesare permitted to inherit behaviour from just one superclass,but may declare that they implement multiple fully abstractclass specifications (i.e. “interfaces”).At runtime data exists as scalars, as references, and asinstances of value classes. There is a fundametaldistinctionmade between value and reference classes. Value classesdo not inherit behaviour, and cannot have virtual meth-ods. As the name implies assignment of such values hasvalue (non-aliassing) semantics. In the newly announcedlanguage C#,
s are implemented as value classes,while
es are implemented as reference classes in
.Valueclass instancesmaybestaticallyallocated,auto-matically allocated at method entry, or be boxed in dynam-ically allocated objects. The instruction set has support forboxingandunboxingofsuchvalues. Dynamicallyallocatedobjectsare garbagecollectedwhennolongeraccessible. Aswith the
there is no union construct.During the execution of a method, the evaluation stack consists of a finite stack of abstract values. The depth of thestack is statically determinedby the compiler. Eachabstractstack element may contain any value, including an instanceof a value class. Unlike the
, no value uses up multipleelements.
3.1 Assembly Files
At deployment time a
program is represented bya set of one or more dynamically loaded
assembly files

Activity (3)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
sivakotesh_it liked this

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->