Comparing Kotlin performance with Graal and C2
You may have heard of Graal, the new JIT compiler for the JVM written in Java. It is available inside the JDK since Java10 and in the future will probably become the standard of the JDK.
If you are interested, you can find more information here: https://www.infoq.com/articles/Graal-Java-JIT-Compiler
In the last year I mostly worked with Kotlin and as personal project I have implemented a bot to play the game of Go in Kotlin. You can find the source here: https://github.com/uberto/kakomu
As you can imagine, in any Game Bot (Chess, Go, etc.) the ability of the computer player is directly proportional to its speed. In my case the main engine is based on the MonteCarloSearch algorithm, which simulate thousands of random game to evaluate each position.
So why not let give Graal a run on it?
Since Java 10 you can activate Graal simply using VMOptions at start.
First a word about testing methodology:
All tests run on my Notebook with i7 2.0Ghz 16Gb Ubuntu 18.4 with OpenJdk 11 graal VM Options: -Xms6g -Xmx6g -XX:+UseParallelOldGC -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler c2 VM Options: -Xms6g -Xmx6g -XX:+UseParallelOldGC I run a test called PerformanceTest in each project which execute a end to end benchmark continuously. In this way the compiler cannot optimize the code for a specific case. Then I choose the faster result, assuming it is the one without GC and OS pauses. All tests are running mono-thread.
And here the results:
As you can see Graal compiled Kotlin is sensibly faster on the small board and only marginally faster with the big board random plays.
I suspect this is because with the bigger board memory management is taking a big chunk of the running time. In any case the increment is more than welcome, especially considering that I usually play on the smaller board.
To test my theory I created a new project with some implementation of classical algorithms to see the difference in performance. You can find it here: https://github.com/uberto/kotlin-perf
For the moment there are two such algorithms: a Mandelbrot Set generator and a Knapsack solver.
Mandelbrot Set
Wikipedia reference: https://en.wikipedia.org/wiki/Mandelbrot_set
The Mandelbrot Set is probably the most famous fractal shape and you may have seen it even if you don’t know the name.
Mathematically is defined as the Set of all points in the Complex plane where the function z <- z^2+c will not diverge when iterated. It is quite easy to generate the set iterating the functions for some points on the complex plane and creating an image out of them.
Since the goal here is the performance and not the graphics I kept things simple using text graphic.
Let’s start looking at the code for the Mandelbrot Set.
data class Complex(val r: Double, val i: Double){ operator fun times(other: Complex) = Complex( r = this.r * other.r - this.i * other.i, i = this.i * other.r + this.r * other.i ) operator fun plus(other: Complex) = Complex(r = this.r + other.r, i = this.i + other.i) operator fun minus(other: Complex) = Complex(r = this.r - other.r, i = this.i - other.i) fun squared() = this * this fun squaredModule() = r * r + i * i fun Double.toComplex() = Complex(r=this, i=0.0) }
fun mandelSet(initZ: Complex, c:Complex, maxIter: Int): Int { var z = initZ (1..maxIter).forEach{ z = z.squared() + c if (z.squaredModule() >= 4) return it } return maxIter }
You can see here how using operator overloading and a data class to represent complex numbers can really simplify the code and making easier to understand.
Once we defined in the Complex class the rules to do operations on complex numbers, the mandelSet function has only to check if the operation z <- z^2+c is “escaping” or not and in case after how many iteration it will exceed the threshold of 4.
Here you can see the characteristic cardioid shape of Mandelbrot Set in the output rendered in AsciiArt:
Knapsack problem
The wikipedia reference for the Knapsack problem: https://en.wikipedia.org/wiki/Knapsack_problem
The Knapsack problem can be defined in several ways. Imagine to be a thief that just broke into a watch shop. You can steal as many watches you want provided that you don’t exceed the maximum weight you can carry on your knapsack.
Being a rational thief, you definitely want to optimize the value of the watches you will bring out with you. Every watch has a price and a weight attached. So you need to find the set of Watches with maximum total price for a given weight.
Real world applications include optimizing cuts and materials for CNC applications and marketing strategies for allocating advertising budget.
For example let’s start with a shop with only 3 watches defined like this:
val shop = Knapsack.shop( Watch(weight = 1, price = 1), Watch(weight = 3, price = 2), Watch(weight = 1, price = 3) )
If we have a maximum weight of 1, it’s better for us to pick up the third watch, rather than the first, because the value is higher.
If we have a maximum weight of 3, we can choose the number 2 (price 2) or number 1 and 3 (price 1 + 3). In this case it’s better to pick up 1 and 3 even if their combined weight is less than the maximum.
These are the complete solutions for this shop:
assertEquals(3, selectWatches(shop, maxWeight = 1)) assertEquals(4, selectWatches(shop, maxWeight = 2)) assertEquals(4, selectWatches(shop, maxWeight = 3)) assertEquals(5, selectWatches(shop, maxWeight = 4)) assertEquals(6, selectWatches(shop, maxWeight = 5))
As you can see, as the number of watches available increase, the number of possible choices become high very, very quickly. It is a classical NP-Hard problem.
To solve it in reasonable times we need to cheat a bit and use Dynamic Programming. We can build a map with the already optimized solution for a each set of watches and so we can avoid recalculating them each time.
The general algorithm is based on exhaustive search based on recursion. This is the Kotlin code to solve it, separated in a memoization function and the recursive search of maximum value.
typealias Memoizer = MutableMap<String, Int> fun priceAddingElement(memo: Memoizer, shop: Set<Watch>, choice: Set<Watch>, maxWeight: Int, priceSum: Int): Int = shop.filter { !(it in choice) && it.weight <= maxWeight } .map { selectWatches( memo, shop, maxWeight - it.weight, choice + it, priceSum + it.price) } .filter { it > priceSum } .max() ?: priceSum fun selectWatches(memo: Memoizer, shop: Set<Watch>, maxWeight: Int, choice: Set<Watch>, priceSum: Int): Int = memoization(memo, generateKey(choice)) { priceAddingElement(memo, shop, choice, maxWeight, priceSum)} private fun memoization(memo: Memoizer, key: String, f: () -> Int): Int = when (val w = memo[key]) { null -> f().also { memo[key] = it } else -> w }
I really like how Kotlin allows to express the intention clearly without having to repeat yourself. If you don’t know Kotlin I hope this code can intrigue you enough to try it someday.
Benchmarks
And now the part you were waiting all along. Let’s compare the performance of Graal versus the good old C2 compiler.
Let’s remember that Graal is written in Java and it is taking advantage of new researches in the compiler space, but it is still relatively young. C2 on the other side is extremely well tuned and mature.
The first surprise is with the Mandelbrot example:
To be honest, I didn’t expect such big difference in performance. Graal is about 18% faster than C2! Just to be sure I have tried with slightly different formulas with same results. Graal really shines at compiling calculations done in Kotlin.
And now for a even bigger surprise, the Knapsack test:
Here Graal is 54% slower!
Doing some profiling I found that my code was spending most of the time on the function that was generating the key for the memoization.
To be sure to be correct I ordered the set and then transformed in a string. This is a lot of unnecessary work and it relies on HashSet java implementation.
So I changed the method to generate the key from this:
private fun generateKey(choice: Set<Watch>): String = choice.sortedBy { "${it.price}-${it.weight}" }.toString()
To this:
private fun generateKey(choice: Set<Watch>): String = choice.map{ it.hashCode() }.sorted().joinToString("")
The new function is faster because it sort the hashes of Watches (which are unique) and then just join them in a String.
Note that we cannot simply use the hash of the Set because there could be a clash of hashes. I actually tried it and verified it started emitting wrong results.
It is possible to create a safer hashing method for the Set but the goal here is not to optimize the algorithm at maximum but to write efficient and clear Kotlin code.
And now the results for idiomatic Kotlin:
Here Graal is again sensibly faster than C2, and overall the new key generator is much faster than the previous implementation.
My guess at these results is that C2 is heavily optimized (using intrinsics etc.) for the typical Java usage, while Graal excels at compiling small methods and lightweight objects, which are typical of idiomatic Kotlin.
This is all worthing further investigation but I hope this post can motivate more Kotlin developers in using Graal.
If you are interested on more similar posts please follow my Twitter account @ramtop or look at my blog
Published on Java Code Geeks with permission by Uberto Babrini, partner at our JCG program. See the original article here: Comparing Kotlin performance with Graal and C2 Opinions expressed by Java Code Geeks contributors are their own. |