Introduction to JVM and JVM languages
Java Virtual Machine (or JVM for short) is a platform-dependent software that allows you to execute programs written in languages like Java. Languages such as Scala and Kotlin utilize JVM for execution and are also often referred to as JVM languages for this reason. Code written in these languages is often identified via their file extensions such as .java
and .scala
. Compiling source files of these languages results in .class
files, which are a special representation of your source code and contain information necessary for successful execution. Each class file begins with the magic number 0xCAFEBABE
, which helps identify this format.
This is how a class file is represented as per the Java Virtual Machine Specification:
ClassFile { u4 magic; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info constant_pool[constant_pool_count-1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attributes_count]; }
Note: The sizes are represented as values of type ux
, where x
is an exponent of 2. For example, u2
is a value that takes up 2 bytes or 16 bits, and u4
is 4 bytes or 32 bits. You can use javap
to generate a readable representation of a class file.
javac Main.java javap -c -v Main
Constant Pool
The constant pool of a class is a sort of a key-value store containing entries for things like String
constants, as well as references to all classes and methods that are referenced by the class. The type of each constant pool entry is indicated by a single byte falling in the integral range [1, 18]
, often referred to as a “constant pool tag”.
Consider the following snippet:
/ Main.java class Foo { public void bar() { } } public class Main { public static void main(String[] args) { Foo f = new Foo(); f.bar(); String lang = "java"; } }
The constant "java"
is stored in the constant pool as:
#11 = Utf8 java
You can generalize the format as:
#index = type value
You will also find information on classes and methods used within this class in its constant pool:
// Main.class #6 = Utf8 ()V #7 = Class #8 // Foo #8 = Utf8 Foo #9 = Methodref #7.#3 // Foo.'<init>':()V #10 = Methodref #7.#11 // Foo.bar:()V #11 = NameAndType #12:#6 // bar:()V #12 = Utf8 bar
Class references (Indicated by the Class
type) are composed only of one simple Utf8
entry, signifying the name of the referenced class. Method references (MethodRef
entries) are more complex, and are of the form <Class>.<NameAndType>
. The NameAndType
entry is again composed of two Utf8
entries, i.e. the name of the method and its descriptor.
Any entry that references another entry will contain an index pointing to that other entry. For example, at index 7 is this entry: #7 = Class #8 // Foo
. This entry refers to a class whose name is contained in index 8. The entry in index 8 is a Utf8
entry with the name of the class, Foo
.
Any index referenced by some entry in the constant pool must be a valid index of only that constant pool.
Introduction to bytecode representation
The readable representation of the bytecode for the main
method in the above example obtained via javap
is:
0: new #7 // class Foo 3: dup 4: invokespecial #9 // Method Foo.'<init>':()V 7: astore_1 8: aload_1 9: invokevirtual #10 // Method Foo.bar:()V 12: ldc #13 // String java 14: astore_2 15: return
The comments you see here are clarifications inserted by javap
and do not appear in the constant pool.
Each line of a method’s representation describes a single bytecode instruction in the following format:
offset: instruction arg1, arg2
You may have noticed that the instruction offsets shown here are discontinuous. The first instruction is at 0
, while the second one starts at 3
. This is because instructions may have any number of operands embedded in bytecode. For example, the invokespecial
instruction requires one 2-byte operand. Similarly, the new
instruction at the start takes a 2-byte operand which occupies space represented by the offsets 1 and 2, which is why 3 is the next available offset for an instruction.
Note: Bytecode is represented as a byte
array and its offsets are not the same as constant pool indices.
Method invocation
JVM uses certain instructions such as invokevirtual
, invokespecial
, and invokestatic
to invoke methods depending on their nature. For example, constructors are invoked via invokespecial
, static methods via invokestatic
, and other methods via invokevirtual
. Instructions such as invokeinterface
and invokedynamic
fall outside this blog’s scope.
Let’s take a closer look at the invokevirtual
instruction in the listing for main
:
9: invokevirtual #10 // Method Foo.bar:()V
In the example above, invokevirtual
is at offset 9
. It takes one 2 byte operand, whose contents are located at offsets 10
and 11
. invokevirtual
‘s operand is interpreted as the index of a MethodRef
entry in the class’s constant pool. The value of the index specified is 10
, meaning the tenth entry in the constant pool. javap
has helpfully included the value of that entry for us as a comment — Method Foo.bar:()V
. We now have all the information required for the JVM to invoke the specified method, Foo.bar()
. Arguments are passed to the invoked method beforehand by pushing values onto the operand stack using instructions from the *const
and *load
families.
Note: Here, we say *load
because this instruction can be considered to be an entire family of instructions. Depending on its prefix we can interpret it as loading an integer, a floating point constant, or even an object reference. The same principle applies to the *const
family, except with only integer and floating point types (And, as a special case of a constant value, null
). Examples of instructions in this family are: aload
, iload
, fload
, etc.
Control flow
if
conditions, loops, and unconditional jumps are important parts of control flow. Let’s take a look at how the JVM executes each of these.
Pre-requisites: Local array and stack
Every method has a small space allocated to it within the Java call stack called a frame. Frames store local variables, the operand stack for the method and also the address of the constant pool of the method’s containing class.
The operand stack is, as its name indicates, a stack structure. It is used to store input and output data for instructions. For example, the iadd
instruction expects two integer values to be present in the operand stack beforehand. It pops its operands from the stack, adds them, and then pushes the result back onto the operand stack for future instructions to use.
A method’s parameters, and any local variables declared within it will have a predetermined slot in the corresponding stack frame’s local variable array. For instance methods (non-static methods), the first entry in the local variable array will always be a reference to the object referred to by the this
pointer. The referenced object and the method’s declared arguments must first be pushed onto the operand stack of the calling method.
When invokevirtual
is called, the number of values to pop from the operand stack is calculated based on the invoked method’s descriptor. That same number of values, (plus one more for the this
pointer) are popped from the operand stack. These values are then placed into the local variable array of the new frame, with the first entry always being the this
pointer, followed by the arguments in their declared order.
Once the arguments are copied over, the JVM sets the program counter to the offset of the first instruction of the method and starts executing bytecode again. When the end of the method is reached, the current frame is discarded and the JVM returns control flow to the next instruction after invokevirtual
. Any returned value is popped off the operand stack of the invoked method and pushed onto the operand stack of the previous method to be used by subsequent instructions.
If condition
Consider the following snippet and its bytecode:
int i = 0; if (i == 0) { i++; }
// Explanatory comments added for better understanding 0: iconst_0 // Push const `0` to stack 1: istore_1 // Pop value off the stack and store it in local array at pos `1` 2: iload_1 // Push value from local array at pos `1` to stack 3: ifne 9 // Compare it against `0` and if not equals to 0, continue execution from offset `9` 6: iinc 1, 1 // Increment the value in local array at pos `1` by `1` 9: return // End of method
Instructions such as ifeq
, ifne
, iflt
, ifge
, ifgt
, and ifle
are used when a variable (for example x
in this case) is being compared against 0
. These instructions pop the value off the stack, compare it against 0
and if the condition holds true, the control jumps to the specified offset. Instructions such as if_icmpxx
(where xx is [eq
, neq
, lt
, gt
, ge
, le
]) work by popping off arguments off the stack and then comparing them.
Loops
Consider the following snippet and its bytecode:
for (int i = 0; i <= 10; i++) { // }
// Explanatory comments added for better understanding 0: iconst_0 // Push `0` to stack 1: istore_1 // Pop an int value, i.e. `0` and store it in local array at pos `1` 2: iload_1 // Load value from local array at pos `1` onto the stack 3: bipush 10 // Push const `10` to stack 5: if_icmpgt 14 // Pop both the values, i.e. `0` and `10` and compare. If true, continue exec from offset `14` 8: iinc 1, 1 // Increment value at local array pos `1` by `1`. 11: goto 2 // Go to offset `2` and repeat instructions until the loop condition evaluates to false 14: return
A loop is just a set of statements executed until the specified condition evaluates to false. The bytecode generated is more or less similar to the one that we’ve seen previously. The only difference is that the goto
instruction is used to jump to a previous offset and resume execution, i.e. to execute previously executed statements thereby essentially keeping the loop running.
JVM is one of the most exciting platforms out there. What we’ve seen so far in this blog is a tiny fraction of its working and internals. If you wish to further delve into JVM and its technicalities, consider getting started with The Java Virtual Machine Specification.
Published on Java Code Geeks with permission by DeepSource, partner at our JCG program. See the original article here: Method invocation and control flow in JVM Opinions expressed by Java Code Geeks contributors are their own. |