Web Framework Benchmarks – Round 10

Marko AsplundJune 10th, 2015Last Updated: June 10th, 2015

0 47 6 minutes read

TechEmpower Web Framework Benchmarks is a collaborative, open web framework benchmarking project that I blogged about last year (see An open web application framework benchmark). Since the last write-up a new benchmark round has been run with lots of new test implementations. Among others, round 10 benchmark run includes support for the Cassandra NoSQL database, as well as a new Java-based test implementation leveraging Servlet 3 asynchronous processing.

I did some non-scientific analysis of TFB round 10 results and here’re a few observations on the Peak environment, “best” concurrency level results. According to the documentation round 10 and 9 Peak hosting hardware specifications are the same, so the results should be directly comparable.

JSON serialization test

lwan has overtaken previous run’s #1 performer in the JSON serialization test by a large margin, more than doubling round 9 winner’s throughput. cpoll_cppsp, last round’s winner, also improved its throughput by nearly 700k req/s. Six contestants from round 9 have made it into top 10 in round 10, with all of them being able to improve their performance. OpenResty managed to more than double its round 9 throughput.

My new entrant, servlet3-cass, has taken 5th place in this category, which made it the second best performing Java implementation, losing only slightly to Netty. Based on this, I would conclude that Servlet container overhead doesn’t currently seem to impose a significant bottleneck in this test. It’s interesting to note that there’s about 400 req/s difference between servlet and servlet3-cass performance, though the implementations are very similar. Both have been implemented on the Servlet API, run on the Resin servlet container and use Jackson for JSON processing. The only difference between the two seems to be Jackson JSON library version numbers.

The top 10 test implementations were based on the following programming languages: C, C++, Java and Lua. Of the top 10 performers 5 are Java based, and Ur and Go got dropped out of the top 10 from the previous round.

Single database query test

Since round 9 the #1 performer’s throughput has improved only marginally in this test category. C++ based test implementations dominate the top 4 spots with Lua, Ur and Java also in top 10. Only test implementations based on relational databases made it to top 10, leaving out MongoDB and Cassandra.

Some test implementations seem to have multiple entries in the results table (e.g. cpoll_cppsp-postgres, undertow and activeweb) for this test. I guess this is because the test was run multiple times, but I would’ve expected each framework to be listed only once in the best results table. Also, what’s the exact test methodology like in these cases? Under which conditions does a framework get multiple tries? Some test implementations seem to have experienced performance degradation since round 9. One such example is OpenResty. It would be interesting to analyze the cause further. Was the degradation caused by e.g. changes in the test implementation, infrastructure setup or changes in test methodology?

I was a bit disappointed with the performance of the servlet3-cass test implementation in the database tests. Unfortunately, there’s no resource usage or profiling data available for the test run, and as I don’t have access to a real performance test environment myself, so it’s difficult to analyse potential bottlenecks further.

Because servlet3-cass was able to reach 14 times the single database query throughput with the JSON test, I would reason that no inherent app server framework bottleneck gets reached in the database test. By comparison, the Java and servlet API based test implementation that uses MySQL database, was able to achieve nearly twice the throughput. Two notable implementation differences come to mind between the two test implementations: a) different datastore and b) use of different servlet API (synchronous vs. asynchronous).

Since, Cassandra is in general very fast in reading and writing data by key, there shouldn’t be any inherent reason why it should perform worse than the MySQL based implementation. Some aspects I need to analyse in more detail: 1) Cassandra server configuration 2) Resin servlet container asynchronous processing support and/or configuration 3) check that thread pool size for Resin asynchronous processing and Cassandra driver is optimized for the test server hardware concurrency level. During the test implementation I ran into a couple of bugs in servlet container asynchronous processing support (both Resin + Tomcat. Kudos to both teams for fixing them!), which proved that asynchronous processing is a tricky issue, not only for application, but also app server developers. In order to evaluate whether some aspect of asynchronous processing has a negative performance impact it would be interesting to implement the test using traditional synchronous servlet API.

Multiple database queries test

The best two implementations in this test, start and stream, were able to hold on to their round 9 positions, while both improved throughput by more than 30 %. The top 10 test implementations are based on Dart, Java, C++ and Clojure.

MongoDB based implementations have claimed a triple win in this test with the 4 implementations that follow being PostgreSQL based.

Fortunes test

Five new test implementations made it to top 10 compared to the previous round. The top 10 implementations were implemented in C++, Ur and Java. Again, both undertow and undertow edge seem to be included three times in round 9 data table, which looks a bit strange for the best results table.

Database updates test

Nodejs based implementations that occupied the top 3 places on round 9, have seen their lead overtaken by C++ based implementations. Instead of raw processing efficiency this test type is expected to primarily stress the backend datastore, datastore driver and parallelism of the framework, so C++ as a language should not have an inherent advantage, as such. The top 10 implementation languages in this test are: C++, JavaScript, Scala, Dart and Perl.

Having done quite a bit of Perl programming years ago, I was quite fascinated to see an implementation based on the old scripting workhorse make it to top 10.

In this test there’s also been about 10 % performance degradation for the top 5 implementations.

On round 9 the top 10 implementations were all based on MySQL whereas on round 10 three PostgreSQL based implementations have entered the top 10.

Plaintext test

The top 10 in this category is occupied by test implementations based on C++, C, Java and Scala. While there’s been only a modest performance increase for the #1 position, the top place has been overtaken by ulib. Four new entrants have made their way into top 10. Netty has gained over 40% throughput increase compared to the previous round.

The development process

The TechEmpower team has again put a huge effort into the FrameworkBenchmarks project and done a great job at it! Still, as with everything, there’s always room for improvement and, from a casual test implementation contributor perspective, the following issues should be considered:

more predictable release schedule. I realize the TE team needs to prioritize actual customer projects over pro bono style work. However, many test implementers are in a similar position, so having a predictable release plan would help contributors schedule their work. It seems that at least part of the delays may have been caused by scope-creep, so a stricter release policy could also help make release schedule more predictable.
release phase change notifications. Not all contributors are able to follow the TFB Google group discussion. Having a notification mechanism for informing about release schedule phase changes could be helpful for many contributors. This could be as simple as a Github issue to subscribe to or a separate Google group for announcements.
enable test implementation logging during preview runs (functional test run). I found it very difficult to troubleshoot infrastructure deployment automation related bugs during the preview runs. Allowing server side logging for preview runs could be a huge help for test implementers troubleshooting their code.
provide resource consumption statistics for preview runs. The only performance metric a test implementer currently gets is the throughput figures per test and concurrency level. This gives the implementer very little to go by, in terms of optimization feedback.
provide access to app server, DB server etc. logs. Additional infrastructure logs would help in identifying functional and performance related issues.
performance related data gathering could even be taken further by running test implementations using a profiling tool during preview runs.

Benchmark results visualization tool updated

I’ve updated a TFB projects results visualization tool I created to also render round 10 results. As before, the tool can be found here: http://tfb-kippo.rhcloud.com/

A new Scala + ElasticSearch based test implementation

The project team has been hard at work and they’ve published a tentative release schedule for round 11 results, which are aimed at being released by the end of June 2015. Since round 10, I’ve contributed support for ElasticSearch search server as well as a Scala / Spray based test implementation. Both have been merged into the project repository, so the results should be available when round 11 data gets published.