Professional Documents
Culture Documents
An introduction to the basic structure and lifestyle of the Java class file
Welcome to another installment of "Under the Hood." In last month's article I discussed
the Java Virtual Machine, or JVM, the abstract computer for which all Java programs are
compiled. If you are unfamiliar with the JVM, you may want to read last month's article
before this one. In this article I provide a glimpse into the basic structure and lifestyle of
the Java class file.
Born to travel
The Java class file is a precisely defined format for compiled Java. Java source code is
compiled into class files that can be loaded and executed by any JVM. The class files
may travel across a network before being loaded by the JVM.
In fact, if you are reading this article via a Java-capable browser, class files for the
simulation applet at the end of the article are flying across the Internet to your computer
right now. If you'd like to listen in on them (and your computer has audio capability),
push the following button:
Sounds like they're having fun, huh? That's in their nature. Java class files were designed
to travel well. They are platform-independent, so they will be welcome in more places.
They contain bytecodes, the compact instruction set for the JVM, so they can travel light.
Java class files are constantly zipping through networks at breakneck speed to arrive at
JVMs all over the world.
The Java class file contains everything a JVM needs to know about one Java class or
interface. In their order of appearance in the class file, the major components are:
magic, version, constant pool, access flags, this class, super class, interfaces, fields,
methods, and attributes.
Information stored in the class file often varies in length -- that is, the actual length of the
information cannot be predicted before loading the class file. For instance, the number of
methods listed in the methods component can differ among class files, because it depends
on the number of methods defined in the source code. Such information is organized in
the class file by prefacing the actual information by its size or length. This way, when the
class is being loaded by the JVM, the size of variable-length information is read first.
Once the JVM knows the size, it can correctly read in the actual information.
Information is generally written to the class file with no space or padding between
consecutive pieces of information; everything is aligned on byte boundaries. This helps
keeps class files petite so they will be aerodynamic as they fly across networks.
The order of class file components is strictly defined so JVMs can know what to expect,
and where to expect it, when loading a class file. For example, every JVM knows that the
first eight bytes of a class file contain the magic and version numbers, that the constant
pool starts on the ninth byte, and that the access flags follow the constant pool. But
because the constant pool is variable-length, it doesn't know the exact whereabouts of the
access flags until it has finished reading in the constant pool. Once it has finished reading
in the constant pool, it knows the next two bytes will be the access flags.
The first four bytes of every class file are always 0xCAFEBABE. This magic number
makes Java class files easier to identify, because the odds are slim that non-class files
would start with the same initial four bytes. The number is called magic because it can be
pulled out of a hat by the file format designers. The only requirement is that it is not
already being used by another file format that may be encountered in the real world.
According to Patrick Naughton, a key member of the original Java team, the magic
number was chosen "long before the name Java was ever uttered in reference to this
language. We were looking for something fun, unique, and easy to remember. It is only a
coincidence that OxCAFEBABE, an oblique reference to the cute baristas at Peet's
Coffee, was foreshadowing for the name Java."
The second four bytes of the class file contain the major and minor version numbers.
These numbers identify the version of the class file format to which a particular class file
adheres and allow JVMs to verify that the class file is loadable. Every JVM has a
maximum version it can load, and JVMs will reject class files with later versions.
Constant pool
The class file stores constants associated with its class or interface in the constant pool.
Some constants that may be seen frolicking in the pool are literal strings, final variable
values, class names, interface names, variable names and types, and method names and
signatures. A method signature is its return type and set of argument types.
Each element of the constant pool starts with a one-byte tag specifying the type of
constant at that position in the array. Once a JVM grabs and interprets this tag, it knows
what follows the tag. For example, if a tag indicates the constant is a string, the
JVM expects the next two bytes to be the string length. Following this two-byte length,
the JVM expects to find length number of bytes, which make up the characters of the
string.
In the remainder of the article I'll sometimes refer to the nth element of the constant pool
array as constant_pool[n]. This makes sense to the extent the constant pool is organized
like an array, but bear in mind that these elements have different sizes and types and that
the first element has an index of one.
Access flags
The first two bytes after the constant pool, the access flags, indicate whether or not this
file defines a class or an interface, whether the class or interface is public or abstract, and
(if it's a class and not an interface) whether the class is final.
This class
The next two bytes, the this class component, are an index into the constant pool array.
The constant referred to by this class, constant_pool[this_class], has two parts, a one-byte
tag and a two-byte name index. The tag will equal CONSTANT_Class, a value that
indicates this element contains information about a class or interface.
Constant_pool[name_index] is a string constant containing the name of the class or
interface.
The this class component provides a glimpse of how the constant pool is used. This class
itself is just an index into the constant pool. When a JVM looks up
constant_pool[this_class], it finds an element that identifies itself as a
CONSTANT_Class with its tag. The JVM knows CONSTANT_Class elements always
have a two-byte index into the constant pool, called name index, following their one-byte
tag. So it looks up constant_pool[name_index] to get the string containing the name of
the class or interface.
Super class
Following the this class component is the super class component, another two-byte index
into the constant pool. Constant_pool[super_class] is a CONSTANT_Class element that
points to the name of the super class from which this class descends.
Interfaces
The interfaces component starts with a two-byte count of the number of interfaces
implemented by the class (or interface) defined in the file. Immediately following is an
array that contains one index into the constant pool for each interface implemented by the
class. Each interface is represented by a CONSTANT_Class element in the constant pool
that points to the name of the interface.
Fields
The fields component starts with a two-byte count of the number of fields in this class or
interface. A field is an instance or class variable of the class or interface. Following the
count is an array of variable-length structures, one for each field. Each structure reveals
information about one field such as the field's name, type, and, if it is a final variable, its
constant value. Some information is contained in the structure itself, and some is
contained in constant pool locations pointed to by the structure.
The only fields that appear in the list are those that were declared by the class or interface
defined in the file; no fields inherited from super classes or superinterfaces appear in the
list.
Methods
The methods component starts with a two-byte count of the number of methods in the
class or interface. This count includes only those methods that are explicitly defined by
this class, not any methods that may be inherited from superclasses. Following the
method count are the methods themselves.
The structure for each method contains several pieces of information about the method,
including the method descriptor (its return type and argument list), the number of stack
words required for the method's local variables, the maximum number of stack words
required for the method's operand stack, a table of exceptions caught by the method, the
bytecode sequence, and a line number table.
Attributes
Bringing up the rear are the attributes, which give general information about the
particular class or interface defined by the file. The attributes section has a two-byte
count of the number of attributes, followed by the attributes themselves. For example,
one attribute is the source code attribute; it reveals the name of the source file from which
this class file was compiled. JVMs will silently ignore any attributes they don't recognize.
The applet below simulates a JVM loading a class file. The class file being loaded in the
simulation was generated by the javac compiler given the following Java source code:
class Act {
public static void doMathForever() {
int i = 0;
while (true) {
i += 1;
i *= 2;
}
}
}
The above snippet of code comes from last month's article about the JVM. It is the same
doMathForever() method executed by the EternalMath applet from last month's article. I
chose this code to provide a real example that wasn't too complex. Although the code
may not be very useful in the real world, it does compile to a real class file, which is
loaded by the simulation below.
The GettingLoaded applet allows you to drive the class load simulation one step at a
time. For each step along the way you can read about the next chunk of bytes that is
about to be consumed and interpreted by the JVM. Just press the "Step" button to cause
the JVM to consume the next chunk. Pressing "Back" will undo the previous step, and
pressing "Reset" will return the simulation to its original state, allowing you to start over
from the beginning.
The JVM is shown at the bottom left consuming the stream of bytes that makes up the
class file Act.class. The bytes are shown in hex streaming out of a server on the bottom
right. The bytes travel right to left, between the server and the JVM, one chunk at a time.
The chunk of bytes to be consumed by the JVM on the next "Step" button press are
shown in red. These highlighted bytes are described in the large text area above the JVM.
Any remaining bytes beyond the next chunk are shown in black.
I've tried to fully explain each chunk of bytes in the text area. There is a lot of detail,
therefore, in the text area and you may wish to skim through all the steps first to get the
general idea, then look back for more details.
Source Code
/********************************************************************
PROJECT: JavaWorld
MODULE: Under The Hood
FILE: GettingLoaded.java
AUTHOR: Bill Venners, June 1996
DESCRIPTION:
This file contains all the code for the Java class load simulator that
accompanies the Under The Hood article titled, "The Java Class File
Lifestyle".
I developed this under Symantec Cafe on Windows 95. As I developed it I
had
each class in its own file, which made for very speedy compile and test
cycles. I lumped all the files together into this file to make it easier
to download.
This applet retrieves two files from the server, the Act.class file
itself,
from which it gets the bytes to display along the bottom, and a text
file
which contains the text that accompanies each step. Each block of text
is
separated by a line of stars which contains one star for each byte
consumed
by the step.
*********************************************************************/
import java.awt.*;
import java.applet.*;
import java.io.InputStream;
import java.io.DataInputStream;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.EOFException;
import java.net.URL;
import java.net.URLConnection;
import java.net.MalformedURLException;
super.init();
ta.setEditable(false);
setBackground(Color.blue);
ta.setBackground(Color.white);
try {
conn = this.theClassFileURL.openStream();
data = new DataInputStream(new BufferedInputStream(conn));
try {
while (true) {
int unsignedByte = data.readUnsignedByte();
HexString hexStr = new HexString(unsignedByte, 2);
buf.append(hexStr.getString());
}
}
catch (EOFException e) {
jvmPacman.setText(buf.toString());
}
try {
ta.setText(titleString + "Loading Second Of Two
Files...\n");
conn = this.theActTextURL.openStream();
data = new DataInputStream(new
BufferedInputStream(conn));
buf.setLength(0);
ButtonPanel() {
setLayout(new GridLayout(3, 1, 5, 5));
setBackground(Color.blue);
Button b = new Button("Step");
b.setBackground(Color.lightGray);
add(b);
b = new Button("Back");
b.setBackground(Color.lightGray);
add(b);
b = new Button("Reset");
b.setBackground(Color.lightGray);
add(b);
}
// The string passed to the constructor and to addText must be one line
to
// be printed out, excluding a closing return.
class JVMPacman extends Canvas {
JVMPacman() {
setBackground(Color.cyan);
}
theString = passedText;
stringValid = true;
}
dim = size();
int xStartingPoint = 5;
charsToWriteCount = charsThatFitBetweenRectanglesCount;
}
xTextStartingPoint += fm.stringWidth(redString);
g.setColor(Color.black);
g.drawString(theString.substring(blackStringStartingPos
ition,
blackStringStartingPosition + blackCharsCount),
xTextStartingPoint, yStartingPoint);
}
}
}
}
ControlPanel() {
setLayout(new BorderLayout(5, 5));
setBackground(Color.blue);
add("West", new ButtonPanel());
add("Center", jvmPacman);
}
class StepNode {
String getString() {
return theString;
}
int getByteCount() {
return byteCount;
}
StepNode getNext() {
// Should probably throw an exception here if !nextValid
return next;
}
void setNext(StepNode n) {
next = n;
nextValid = true;
}
boolean last() {
return !nextValid;
}
StepNode getPrev() {
// Should probably throw an exception here if !prevValid
return prev;
}
void setPrev(StepNode n) {
prev = n;
prevValid = true;
}
boolean first() {
return !prevValid;
}
}
// I used this class because I can't seem to set the background color of
// a label. I only want a label, but I want the backgound to be gray.
class ColoredLabel extends Panel {
setLayout(new GridLayout(1,1));
setBackground(color);
add(theLabel);
}
theLabel.setText(s);
}
class HexString {
buf.setLength(0);
int v = val;
for (int i = 0; i < maxNibblesToConvert; ++i) {
if (v == 0) {
if (i == 0) {
buf.insert(0, '0');
}
break;
}
Convert(val, minWidth);
return buf.toString();
}
}
First the JVM must make sure that the class file starts with the proper
magic number. In this case our JVM will be happy because it will find
the CafeBabe magic right where it's supposed to be.
By the way, all numbers are stored in the class file in big-endian
order, which means the higher order bytes come first. The very first
byte of every class file, therefore, will be 0xCA.
****
Step 2. Version Numbers
Next, the JVM must make sure that it recognizes and fully understands
the
format of the class file being loaded. If either the major or minor
version
number is higher than those version numbers for which this JVM was
implemented,
the JVM must reject the class file. In this case, our JVM is relieved
to find
that the file has major version 45 and minor version 3, of which it has
intimate
knowledge.
****
Step 3. Constant Pool Count
0011 17 constant_pool_count
The next two bytes make up an unsigned short integer which indicates
the number of elements in the constant pool array. In this case the
constant pool will have 17 elements, but because the zeroeth element
doesn't appear in the class file, the JVM will expect to find elements 1
through 16 next in the stream.
**
Step 4. constant_pool[1]
07 7 tag
000C 12 name_index
Note that this is the first constant pool element and that it already
has
index 1. constant_pool[0] doesn't appear in the class file.
***
Step 5. constant_pool[2]
07 7 tag
000D 13 name_index
0A 10 tag
0001 1 class_index
0004 4 name_and_type_index
0C 12 tag
000E 14 name_index
0010 16 descriptor_index
"<init>" is the name of the method being described. "()V" is the type.
In
plain Java it would look like "void <init>()". "()V" is a method
descriptor.
The "()" indicates that there are no arguments to "<init>". The "V"
indicates
the return type of "<init>" is void.
*****
Step 8. constant_pool[5]
01 1 tag
000D 13 length
436F6E7374616E7456616C7565 "ConstantValue"
bytes[length]
01 1 tag
000D 13 length
646F4D617468466F7265766572 "doMathForever"
bytes[length]
01 1 tag
000A 10 length
457863657074696F6E73 "Exceptions" bytes[length]
01 1 tag
000F 15 length
4C696E654E756D6265725461626C65 "LineNumberTable"
bytes[length]
01 1 tag
000A 10 length
536F7572636546696C65 "SourceFile" bytes[length]
01 1 tag
000E 14 length
4C6F63616C5661726961626C6573 "LocalVariable" bytes[length]
01 1 tag
0004 4 length
436F6465 "Code" bytes[length]
01 1 tag
0010 16 length
6A6176612F6C616E672F4F626A656374 "java/lang/Object"
bytes[length]
01 1 tag
0003 3 length
416374 "Act" bytes[length]
01 1 tag
0006 6 length
3C696E69743E "<init>" bytes[length]
01 1 tag
000B 11 length
736E697065742E6A617661 "snipet.java" bytes[length]
01 1 tag
0003 3 length
282956 "()V" bytes[length]
0000 access_flags
The access flags are a two byte unsigned integer that is composed by
bitwise
oring bitmasks for individual flags that represent modifiers of the
class or
interface defined by this file. For example, ACC_PUBLIC is 0x0001 and
ACC_FINAL is 0x0010. A class declared to be both public and final would
have
its access flags set to 0x0011, or (ACC_PUBLIC | ACC_FINAL).
In this case no access_flags are set because the class being defined,
class Act,
was not declared to be public, final, or abstract.
Classes which are declared with these modifiers would have the
appropriate bitmasks
from ACC_PUBLIC, ACC_FINAL, and ACC_ABSTRACT ored together to make the
resultant
access_flags. Also, if a class file defines an interface and not class,
then an
ACC_INTERFACE bit is set in access_flags. This is how the JVM knows
whether a class
or an interface is being defined by the file.
**
Step 21. This Class and Super Class
0002 2 this_class
0001 1 super_class
this_class is a two byte unsigned integer index into the constant pool,
where
constant_pool[this_class] is a CONSTANT_Class_info structure
representing the
class defined by this file. In our case it is the CONSTANT_Class_info
structure
for class Act.
0000 0 interfaces_count
0000 0 fields_count
interfaces_count is a two byte unsigned integer which indicates the
number of
interfaces implemented by this class. Because class Act implements no
interfaces,
interfaces_count is zero in this case.
0002 2 methods_count
0009 access_flags
0006 6 name_index
0010 16 descriptor_index
The first three parts of methods[0] are shown here. access_flags gives
the modifiers
with which the method was declared. In this case access_flags is a
0x0009 which
equates to (ACC_PUBLIC | ACC_STATIC). If you look back at the source
code, the
doMathForever() method is indeed declared public and static.
The name_index indicates the constant pool entry where the name of the
method is
stored. In this case name_index is 6 and constant_pool[6] is indeed the
UTF-8
string "doMathForever".
0001 1 attributes_count
methods[0].attributes[0]
000B 11 attribute_name_index
00000030 48 length
The length word indicates the number of bytes in the attribute, in this
case 48 bytes.
This means 48 bytes will follow this length word.
********
Step 26. doMathForever() Max Stack and Max Locals
0002 2 max_stack
0001 1 max_locals
Max stack is a two byte unsigned integer that indicates the maximum
number of entries
on the JVM's operand stack at any point in the method. Max locals is a
two byte
unsigned integer that indicates the number of local variables slots
used by the method.
Each local variable slot is a four byte word.
0000000C 12 code_length
033B8400011A05683BA7FFF9 code[code_length]
0000 0 exception_table_length
pc instruction mnemonic
-- ----------- --------
0 03 iconst_0
1 3B istore_0
2 840001 iinc 0 1
5 1A iload_0
6 05 iconst_2
7 68 imul
8 3B istore_0
9 A7FFF9 goto 2
One other thing you may notice at this point is that these bytecodes
haven't
been assigned a place in memory yet. This is done by the JVM when it
loads the
class file. Therefore, the value I called pc is really the offset from
the
actual program counter address at which this bytecode sequence is loaded
by a JVM.
******************
Step 28. Attributes of the doMathForever() Code Attribute
0001 1 attributes_count
0008 8 attribute_name_index
00000012 18 attribute_length
0004 4 line_number_table_length
line 4: i = 0;
line 5: while (true) {
line 6: i += 1;
line 7: i *= 2;
line_number_table[1]
0002 2 start_pc iinc 0 1
0006 6 line_number i += 1
line_number_table[2]
0005 5 start_pc iload_0, iconst_2, imul,
istore_0
0007 7 line_number i *= 2
line_number_table[3]
0009 9 start_pc goto 2
0005 5 line_number while (true) {
The above table associates each line in the doMathForever() code with
its
corresponding bytecode instruction. The start_pc value of each table
entry
indicates the zero based byte position inside the doMathForever()
bytecode
sequence. The line_number value of each table entry is the line number
of
the source file that corresponds to the start_pc position in the
bytecode
sequence.
This is the last of the information in the class file about the
doMathForever()
method. Next will come the information about the other method of this
class,
the constructor Act().
****************
Step 30. Act()'s Access Flags, Name Index, and Descriptor Index
methods[1]
0000 access_flags
000E 14 name_index
0010 16 descriptor_index
The first three parts of methods[1] are shown here. access_flags gives
the modifiers
with which the method was declared. In this case access_flags is a
0x0000 which
means it's neither public, private, protected, static, final,
synchronized, native,
or abstract. It's just a plain old method.
The name_index indicates the constant pool entry where the name of the
method is
stored. In this case name_index is 14 and constant_pool[14] is indeed
the UTF-8
string "<init>". "<init>" is a special internal method name used to
represent
constructors.
0001 1 attributes_count
000B 11 attribute_name_index
0000001D 29 attribute_length
The length word indicates the number of bytes in the attribute, in this
case 29 bytes.
This means 29 bytes will follow this length word.
********
Step 32. Act() Max Stack and Max Locals
0001 1 max_stack
0001 1 max_locals
Max stack is a two byte unsigned integer that indicates the maximum
number of entries
on the JVM's operand stack at any point in the method, in this case 1.
Max locals is a
two byte unsigned integer that indicates the number of local variables
slots used by
the method, in this case 1. Each local variable slot is a four byte
word.
****
Step 33. Act() Bytecodes and Exception Table
00000005 5 code_length
2AB70003B1 code[code_length]
0000 0 exception_table_length
pc instruction mnemonic
-- ----------- --------
0 2A aload_0
1 B70003 invokenonvirtual #3 <Method
java.lang.Object.<init>()V>
4 B1 return
The above table associates a line in the Java source file with the
starting
instruction of the Act() method. The start_pc value of the table entry
indicates the zero based byte position inside the Act() bytecode
sequence. The line_number value of the table entry is the line number of
the source file that corresponds to the start_pc position in the
bytecode
sequence.
This is the last of the information in the class file about the Act()
method, which was the last method to be described by this file. The last
section of the class file follows, the general class file attributes.
****
Step 36. General Attributes
0001 1 attributes_count
0009 9 attribute_name_index
00000002 2 attribute_length
000F 15 sourcefile_index
A class file can have any number of attributes at the end. In this case
attributes_count is a 1, so there is only 1 attribute here. The
attribute_name_index
is 9, and constant_pool[9] is the string "SourceFile". Therefore, this
is the
"SourceFile" attribute. attribute_length gives the length of the
attribute as 2,
which means that two bytes will follow the attribute_length field. The
last two
bytes are the sourcefile_index, which in this case is 15.
constant_pool[15] is
the string "snipet.java", which as it happens, is the name of the
source file
I compiled with javac to generate this file, Act.class.
**********