Core Java

Self-healing JVM

Fixing yourself to clear weakness and gain superioityThis post is an example about an application where the first solution to each and every IT problem – “have you tried turning it off and on again” – can backfire and do more harm than good.

Instead of turning things off and on, we have an application that can literally heal itself: it fails at the beginning, but starts running smoothly after some time. To give an example of such application in action, we recreated it in the most simple form possible, gathering inspiration from what is now a five-year old post from the Heinz Kabutz’ Java Newsletter:

package eu.plumbr.test;

public class HealMe {
  private static final int SIZE = (int) (Runtime.getRuntime().maxMemory() * 0.6);

  public static void main(String[] args) throws Exception {
    for (int i = 0; i < 1000; i++) {
      allocateMemory(i);
    }
  }

  private static void allocateMemory(int i) {
    try {
      {
        byte[] bytes = new byte[SIZE];
        System.out.println(bytes.length);
      }

      byte[] moreBytes = new byte[SIZE];
      System.out.println(moreBytes.length);

      System.out.println("I allocated memory successfully " + i);

    } catch (OutOfMemoryError e) {
      System.out.println("I failed to allocate memory " + i);
    }
  }
}

The code above is allocating two bulks of memory in a loop. Each of those allocation is equal to 60% of the total available heap size. As the allocations occur sequentially in the same method, then one might expect this code to keep throwing java.lang.OutOfMemoryError: Java heap space errors and never successfully complete the allocateMemory() method.

So, lets see whether our expectations are correct by starting with the static analysis of the source code:

  1. From the first fast examination, this code really cannot complete, because we try to allocate more memory than is available to JVM.
  2. If we look closer we can notice that the first allocation takes place in a scoped block, meaning that the variables defined in this block are visible only to this block. This indicates that the bytes should be eligible for GC after the block is completed. And so our code should in fact run fine right from the beginning, as at the time when it tries to allocate moreBytes the previous allocation bytes should be dead.
  3. If we now look into the compiled classfile, we will see the following bytecode:
private static void allocateMemory(int);
    Code:
       0: getstatic     #3                  // Field SIZE:I
       3: newarray       byte
       5: astore_1      
       6: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
       9: aload_1       
      10: arraylength   
      11: invokevirtual #5                  // Method java/io/PrintStream.println:(I)V
      14: getstatic     #3                  // Field SIZE:I
      17: newarray       byte
      19: astore_1      
      20: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
      23: aload_1       
      24: arraylength   
      25: invokevirtual #5                  // Method java/io/PrintStream.println:(I)V
---- cut for brevity ----

Here we see, that on offsets 3-5 first array is allocated and stored into local variable with index 1. Then, on offset 17 another array is going to be allocated. But first array is still referenced by local variable and so the second allocation should always fail with OOM. Bytecode interpreter just cannot let GC clean up first array, because it is still strongly referenced.

Our static code analysis has shown us that for two underlying reasons, the presented code should not run successfully and in one case, it should. Which one out of those three is the correct one? Let us actually run it and see for ourselves. It turns out that both conclusions were correct. First, application fails to allocate memory. But after some time (on my Mac OS X with Java 8 it happens at iteration #255) the allocations start succeeding:

java -Xmx2g eu.plumbr.test.HealMe
1145359564
I failed to allocate memory 0
1145359564
I failed to allocate memory 1

… cut for brevity ...

I failed to allocate memory 254
1145359564
I failed to allocate memory 255
1145359564
1145359564
I allocated memory successfully 256
1145359564
1145359564
I allocated memory successfully 257
1145359564
1145359564
Self-healing code is a reality! Skynet is near...

In order to understand what is really happening we need to think, what changes during program execution? The obvious answer is, of course, Just-In-Time compilation can occur. If you recall, Just-In-Time compilation is a JVM built-in mechanics to optimize code hotspots. For this, the JIT monitors the running code and when a hotspot is detected, JIT compiles your bytecode into native code, performing different optimizations such as method inlining and dead code elimination in the process.

Lets see if this is a case by turning on the following command line options and relaunching the program:

-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+LogCompilation

This will generate a log file, in our case named hotspot_pid38139.log, where 38139 was the PID of your java process. In this file the following line can be found:

<task_queued compile_id='94' method='HealMe allocateMemory (I)V' bytes='83' count='256' iicount='256' level='3' stamp='112.305' comment='tiered' hot_count='256'/>

This means, that after executing allocateMemory() methods 256 times, C1 compiler has decided to queue this method for C1 tier 3 compilation. You can get more information about tiered compilation’s levels and different thresholds here. And so our first 256 iterations were run in interpreted mode, where bytecode interpreter, being a simple stack machine, cannot know in advance, if some variable, bytes in this case, will be used further on or not. But JIT sees the whole method at once and so can deduce than bytes will not be used anymore and is, in fact, GC eligible. Thus the garbage collection can eventually take place and our program has magically self-healed. Now, I can only hope none of the readers should actually be responsible for debugging such a case in production. But in case you wish to make someone’s life a misery, introducing code like this to production would be a sure way to achieve this.

Reference: Self-healing JVM from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button