Core Java

C code always runs way faster than Java, right? Wrong!

So we all know the prejudice that Java being interpreted is slow and that C being compiled and optimized runs very fast. Well as you might know, the picture is quite different.

TL;DR Java is faster for constellations, where the JIT can perform inlining as all methods/functions are visible whereas the C compiler cannot perform optimizations accross compilation units (think of libraries etc.).

A C compiler takes the C code as input, compiles and optimizes it and generates machine code for a specific CPU or architecture to be executed. This leads to an executable which can be directly run on the given machine without further steps. Java on the other hand, has an intermediate step: Bytecode. So the Java compiler takes Java code as input and generates bytecode, which is basically machine code for an abstract machine. Now for each (popular) CPU architecture there is a Java Virual Machine, which simulates this abstract machine and executes (interprets) the generated bytecode. And this is as slow as it sounds. But on the other hand, bytecode is quite portable, as the same output will run on all platforms – hence the slogan “Write once, run everywhere“.

Now with the approach described above it would be rather “write once, wait everywhere” as the interpreter would be quite slow. So what a modern JVM does is just in time compilation. This means the JVM internally translates the bytecode into machine code for the CPU at hands. But as this process is quite complex, the Hotspot JVM (the one most commonly used) only does this for code fragments which are executed often enough (hence the name Hotspot). Next to being faster at startup (interpreter starts right away, JIT compiler kicks in as needed) this has another benefit: The hotspot JIT known already what part of the code is called frequently and what not – so it might use that while optimizing the output – and this is where our example comes into play.

Now before having a look at my tiny, totally made up example, let me note, that Java has a lot of features like dynamic dispatching (calling a method on an interface) which also comes with runtime overhead. So Java code is probably easier to write but will still generally be slower than C code. However, when it comes to pure number crunching, like in my example below, there are interesting things to discover.

So without further talk, here is the example C code:

test.c:

int compute(int i);

int test(int i);
 
int main(int argc, char** argv) {
    int sum = 0;
    for(int l = 0; l < 1000; l++) {
        int i = 0;
        while(i < 2000000) {
            if (test(i))
            sum += compute(i);
            i++;
        }   
    }
    return sum;
} 

test1.c:

int compute(int i) {
    return i + 1;
}

int test(int i) {
    return i % 3;
}

Now what the main function actually computes isn’t important at all. The point is that it calls two functions (test and compute) very often and that those functions are in anther compilation unit (test1.c). Now lets compile and run the program:

> gcc -O2 -c test1.c

> gcc -O2 -c test.c

> gcc test.o test1.o

> time ./a.out

real    0m6.693s
user    0m6.674s
sys    0m0.012s

So this takes about 6.6 seconds to perform the computation. Now let’s have a look at the Java program:

Test.java

public class Test {

    private static int test(int i) {
        return i % 3;    }

    private static int compute(int i) {
        return i + 1;    }

    private static int exec() {
        int sum = 0;        for (int l = 0; l < 1000; l++) {
            int i = 0;            while (i < 2000000) {
                if (test(i) != 0) {
                    sum += compute(i);                }
                i++;            }
        }
        return sum;    }

    public static void main(String[] args) {
        exec();    }
}

Now lets compile and execute this:

> javac Test.java

> time java Test

real    0m3.411s
user    0m3.395s
sys     0m0.030s

So taking 3.4 seconds, Java is quite faster for this simple task (and this even includes the slow startup of the JVM). The question is why? And the answer of course is, that the JIT can perform code optimizations that the C compiler can’t. In our case it is function inlining. As we defined our two tiny functions in their own compilation unit, the comiler cannot inline those when compiling test.c – on the other hand, the JIT has all methods at hand and can perform aggressive inlining and hence the compiled code is way faster.

So is that a totally exotic and made-up example which never occurs in real life? Yes and no. Of course it is an extreme case but think about all the libraries you include in your code. All those methods cannot be considered for optimization in C whereas in Java it does not matter from where the byte code comes. As it is all present in the running JVM, the JIT can optimize at its heart content. Of course there is a dirty trick in C to lower this pain: Marcos. This is, in my eyes, one of the mayor reasons, why so many libraries in C still use macros instead of proper functions – with all the problems and headache that comes with them.

Now before the flamewars start: Both of these languages have their strenghs and weaknesses and both have there place in the world of software engineering. This post was only written to open your eyes to the magic and wonders that a modern JVM makes happen each and every day.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

25 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shai Almog
8 years ago

Change the main call in the java code to System.out.println(exec()); and see if it still beats C. My bet is that it won’t. JIT’s are pretty amazing and allocation in Java is actually faster than the default C malloc code. GC allows us to use multi-core CPU’s more effectively than manual memory management but unfortunately those benefits are really hard to prove/measure & utilize. E.g. in your case above the JIT can notice that the value isn’t used and effectively optimize the method away completely. While this does show the power of dead code elimination it’s not quite a real… Read more »

Andreas Haufler
8 years ago

Shai, thanks for the hint. While minimizing the code I completely overlooked that. The first test actually hat two loops to warm up the JIT. However, in this case, the JIT will not eliminate the code so the measurement is quite accurate. I added a println to prove that (lucky me ;-)

Shai Almog
8 years ago

Odd. If the time difference was 10% it makes sense. A 2x time difference makes it look like something is wrong either in the C code, compiler options or something else.

I agree that Java is plenty fast and in some cases can beat C but if the number difference is too big it usually means something is problematic in the test.

Duga
Duga
8 years ago
Reply to  Shai Almog

Hint: merge *.c files into one. Gcc/Clang make inline optimalization to the extent of one file, not further.
A mystery solved? I like “java beats c” discussions. Optimized java code cannot be quicker than optimized c (many people like comparing the unoptimized code versus an optimized code). The reason is simple JVM (or .NET) machine. You can keep reducing JVM overhead, but you cannot liquidate him.

gcc 12s, clang 11s before merging
gcc: 5.98s clang 6.1s after merging

java ok 6s
clang version 3.6.2
gcc version 5.2.0 (GCC)

Java version: 1.8.0_51, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-openjdk/jre

Andreas Haufler
8 years ago
Reply to  Duga

Now let’s assume that the two methods in test1.c are in a shared library – the compiler nor the linker can optimize this – on the other hand, the JIT can optimize across classes and jars. Of course this is an edge case – as I have written in the first paragraph…. Read the article again to learn that I never wrote that Java is faster or en per with C. All I said is that Java is NOT ALWAYS INHERENTLY slower than C ;-)

Ondrej
8 years ago
Reply to  Duga

The same ooverhead of virtual machine, which may slow the code down, can at the same time improve it as it has more information during runtime as the C compiler has during compile time. How does the C compiler know, how many cores the CPU has or how much memory is available? JIT also can record statistics about runtime code usage and precompile and optimize the code, which is used frequently, at the expense of memory. This is not possible during compilation time, unless you optimize everything – but then, at a huge expense of memory usage. In the end,… Read more »

Sergii
Sergii
8 years ago

Why C optimization is O2 instead of O3 ? C program also linked from 2 objects – so not all compile time optimizations done – it will be much faster when all functions resident in 1 file – compile time optimizations will work better. Also linking done without optimization (-flto) also not so fair. java code on other hand using static methods in the same class. Done some experiments. For same setup as in article i got: C: real 0m5.246s user 0m4.976s sys 0m0.000s Java: real 0m3.353s user 0m3.148s sys 0m0.008s But the same test written in C but in… Read more »

Andreas Haufler
8 years ago
Reply to  Sergii

Sergii, I think you did not read the article. Having to have two compilation units is the whole point of the article. Simply assume the functions in test1.c are in a shared library. Not way for the compiler nor the linker to optimize this at compile time. On the other hand, the JIT in the JVM can optimize across classes and jars. This is in the very nature of a runtime compiler. Of course I simplified the code in the article as much as possible. Of course it is an edge case. Of course C code is generally faster than… Read more »

Delian Delchev
8 years ago

Here we see a special trick, which is a compiler dependent, which exploits the major problem in the C pattern and try to compare it with the major advantage of the Java compiler – linking of external files. Because C uses object files for linking, which are using a common object format, designed to allow linking of object files produced by different programing languages (aka Fortran and C), the C compiler cannot optimize the code. Something more, in the given example, both files are compiled separately and then they are linked separately without option for a link time optimization. This… Read more »

Shai Almog
8 years ago
Reply to  Delian Delchev

While I mostly agree with your points I have to nitpick on your comment regarding to Javas static methods. Static methods in Java are the exact equivalent of C functions so he’s right including it. If he’d used regular Java method in the C side he’d have to use function pointers to simulate that behavior and in that case Java would have rightfully beaten any real world C code with the best compiler on the market. JVM’s can inline virtual calls as well even with polymorphic objects for some cases. Naturally, C doesn’t use function pointers as often as Java… Read more »

Andreas Haufler
8 years ago
Reply to  Delian Delchev

Delian, of course you’re right – as I have state above – this is really an extreme case in which the C compiler is limited and where Java (or basically and JIT) shines. As I also said this is an oversimplified artificial example. But think of test1.c as a shared library which is linked at runtime. Theres no way (that I know of) that the compiler could optimize this. On the other hand, the JIT can optimize across all loaded classes as it is effectively invoked after runtime linking. Of course, in any real world example, the functions in the… Read more »

Agaton
Agaton
8 years ago

Andreas, most mainstream C/C++ compilators have something called intraprocedural optimizations and profile-guided optimizations. With those options turned on, the compiler can with ease optimize across several compilation units. Give that a try and see what you get. You may be surprised that C/C++ may not be as “static” as you think.

Andreas Haufler
8 years ago
Reply to  Agaton

Thanks for you feedback. As I’ve written above – think of using a shared library. If test1.c would be a dynamically linked library, the compiler would’ve had a hard time to optimize that away. However, the JIT is invoked basically after runtime linking is done so it has more room to optimize in this scenario.

Having that said, I’m a fan of modern compiler technology, both “static” ones and things like JITs. I’m always amazed what kinds of optimizations are performed by these guys.

Ondrej
8 years ago

With all the respect to static compilers, it is incredible what a good JIT can do for dynamic languages, which are usually far behind static languages in speed. There is an interesting JIT compiler from Oracle Labs for several dynamic languages: http://www.oracle.com/technetwork/oracle-labs/program-languages/overview/index.html. This compiler is able to precompile dynamic code into machine code and recompile it to interpreted instructions in case it is not valid anymore.

Mike
Mike
8 years ago

“Java is faster for constellations, where the JIT can perform inlining as all methods/functions are visible whereas the C compiler cannot perform optimizations accross compilation units (think of libraries etc.).”

gcc can perform optimizations across compilation units. Just use the -flto and -ffat-lto-objects options.
You will then get the same results as if all the source code was in one file.

Also if optimization for speed, you should really be using -Ofast rather than just -O2.

Jason Schulz
8 years ago

You could get the same performance benefit by including the test and compute methods in the same translation unit as main (or manually write them in headers), but I think your point is that Java allows you to organize your code without having to worry about lower level optimizations, which I would generally agree. Modern AoT compilers do support inter-module optimization through LTO (link-time optimization) though. The same is true for tracing using PGO (profile guided optimization). However, the rules for C/C++ are more complex and optimizing across translation units can easily expose previously hidden bugs (e.g. UB). So, at… Read more »

Michael Siegel
8 years ago

I think it’s a great article, and average developers are probably not aware of the difference between statically compiled and dynamically compiled, or that Java is performing runtime compilation & optimization. Thanks for the write up, very helpful.

vanduc1102
vanduc1102
8 years ago

Can we make a comparison between M$ office and Open Office.

Ken Fogel
8 years ago

As I always, I am troubled with Java examples made up exclusively of static methods. Using static methods makes Java nothing more than interpreted C. Please compare apples to apples and instantiate the test object in main() and call a method to start the ball rolling.

Nico Liberato Candio
8 years ago

Hi guys, but I miss something or my results are completely unexpected?

nico@nico-desktop:~/linux_stuff$ uname -a
Linux nico-desktop 2.6.32-74-generic-pae #142-Ubuntu SMP Tue Apr 28 10:17:31 UTC 2015 i686 GNU/Linux

(gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1) ) without optimization flag enabled
nico@nico-desktop:~/linux_stuff$ time ./a.out
real 0m11.588s
user 0m11.549s
sys 0m0.000s
(gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1) ) with -o2
nico@nico-desktop:~/linux_stuff$ time ./a.out
real 0m3.037s
user 0m3.024s
sys 0m0.000s

java version “1.8.0_72”
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) Client VM (build 25.72-b15, mixed mode)

nico@nico-desktop:~/linux_stuff$ time java Test
real 0m8.106s
user 0m8.045s
sys 0m0.024s

TRAORE
TRAORE
7 years ago

The article doesn’t say that JAVA is faster than C, no ! It just says that in some cases (think of libraries etc) java (with JIT) can compete with C.That means with JIT java has made lot of progress in terme of performance.
Great article, thanks.

David
David
2 years ago
Reply to  TRAORE

The title of the article implies that there are situations where an entire Java program can be faster than C just because it omits the important fact that Java can’t compete with C for performance except in this very specific situation. Maybe this should have been called “Java can be made faster than C if you build just the right Java program and also avoid some very useful C optimizations.” I’ll have to double-check but I think that C inline function calls were not used so that the C program would be slower than if it was designed properly. But… Read more »

Andy
Andy
2 years ago
Reply to  David

I think the compiler ist smart enough to inline automatically without the keyword. However, in distinct compilation units it can’t (unless link time optimizations are enabled). In this case the C could would be on par with Java (if not faster)…

David
David
2 years ago

If only there was a way to have the C compiler make function inline. I’m just wondering if that was ignored because of a bias towards Java. Java is great at what it does but no one writes a first-person shooter game in Java for a reason.

Andy
Andy
2 years ago
Reply to  David

it will inline automatically if possible. Take the rust compiler as an example, which has a bit of a different approach to libraries/crates which permits such inlinings more easily. Of course its an edge case and not a request to move number crunching to Java. Still die JVM and its JIT is an amazing piece of technology and came a long way…

Back to top button