We Analyzed 60,678 Libraries on Github – Here are the Top 100
What are the top Java libraries used by some of the most popular projects on Github? Based on analyzing 60,678 dependencies
We like backing up everything we say with data, that’s why some people claim we’re not that fun at parties. Obviously, they’re going to the wrong parties. In this post we’ve looked into 60,678 import statements of 11,939 unique Java libraries that are used by the top 5,216 Java projects on Github – and extracted the top 100 to a list. Or how we like to call it, a fun way to spend a rainy weekend.
There’s a tension between new rising technologies and good ol’ tried and tested libraries that we all like to use. The new libraries and frameworks tend to generate more buzz up to a point where it seems everybody are using them and you’re left behind the curve – This is often NOT the case and this post brings the numbers to prove it.
New Post: We Analyzed 60,678 Libraries on Github – Here are the Top 100 http://t.co/fsF79bTMB0 pic.twitter.com/zrtM0Em3nS
— Takipi (@takipid) April 14, 2015
Without Further Ado: The Top 20 Java Libraries
The Main Insights From the Top 100 Libraries List
The Unexpected
- Hadoop Blows Spark Out of the Water – Hadoop comes in at #42 with no mention of Apache Spark in the top 100 list whatsoever. Apache Zookeeper made it to #75, helping maintain Hadoop clusters and keeping the elephants at bay.
- Log4j is 2x More Popular Than Logback – We clearly see that Log4j, which is used in 16.76% of the projects we examined, is outrunning Logback that’s used as the logging engine only behind 8.45% of the top projects.
- SQL > MongoDB > PostgreSQL – The Java SQL connector came in at #27, MongoDB showed up in #87, and PostgreSQL barely made the list at #97.
- ElasticSearch has the Most Justified Buzz Around a Java Library – ElasticSearch, the search server based on Apache Lucene (which made #90 in the list), the E in the ELK stack, and a personal favorite of ours, is the library with the most justified buzz we have on the list.
And… The Usual Suspects
- JUnit is the Undisputed King of Java Libraries – With 3,345 entries, 64% of Github’s top Java projects imports are set on JUnit. Followed by spring-test on the Spring front and testng, these are the top 3 Java testing libraries that we saw in the top 20 list.
- SLF4J is the Most Popular Logging Library – Whether you’re using Log4j, Logback or any other logging engine, with 1,184 entries over 22% of Github’s top Java are using slf4j has their logging facade.
- 14 Out of the Top 100 Libraries are Coming From the Spring Framework – The most popular framework among the top 100 libraries (even more than apache-commons which has 12 libraries in the top 100), with spring-context as its most popular library.
- Google Guava Rocks the Charts as the #4 Most Popular Java Library – With 815 entries which make 15.6% of Github’s top Java projects. We actually love using Guava here at Takipi as well and recently published a post about some of its useful yet lesser known features.
- apache-commons is Really Common Coming in at #5 – With its top representative holding 659 import statements (12.63%) in Github’s top Java projects and 12 of its libraries in the top 100, apache-commons continues to justify its name.
- Mockito is the Most Popular Java Mocking Framework – 559 entries (10.72%) show that mocking makes it big in Java, ranking as the 7th most popular library.
- Developers Love Using joda-time – This comes as no surprise but it’s interesting to see the joda-time library by Stephen Coulbourne reach the 18th place.
5 More Entries Worth Mentioning
- #65 – Bukkit – The only gaming library in the top 100 list, you guessed it right, Minecraft servers.
- #66 – Jetty – Because Netty didn’t make it to the list.
- #81 – PowerMock – A fresh entry to the top 100 list, states that “it can be used to solve testing problems that are normally considered difficult or even impossible to test”.
- #90 – Google Protobuf – A language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
- #100 – AssertJ – Rising in popularity over the last year and also included in the new version of Dropwizard, one of the popular new testing libraries, accepting migrations from FEST Assert.
Top 100 Libraries by Type
To get a better sense of the the types of libraries that gather the most attention from the Java community, we’ve plotted the top 100 by type and their number of uses in Github’s most popular Java projects.
These numbers, where do they come from?
Let’s add some context to the stats: For starters we’ve pulled out the top 25,000 Java projects from Github by stars. On the second step we extracted the ones who use either Maven or Ivy for dependency management to gain quick access to their pom.xml / ivy.xml dependencies, this left us with 5,216 projects. Now that we had thousands of xml dependencies on hand, it was time to get a beer. Once we ran our of beer, we crunched out the data and got a total of 60,678 records of libraries in use with 11,939 unique libraries on hand. This means the average Github project in our dataset uses 11.6 external libraries. To make the analysis easier, we’ve processed the stats for the top 100 libraries by the number of Github projects they appear in. And added some classification by the type of library just for the heck of it.
The raw data is available right here and you’re welcome to take a look and make sure we didn’t miss any interesting insights. Although the beer drinking phase was an essential part of this research, the numbers are accurate.
Further Reading
Another interesting analysis comes from apiwave who looked into the top Java apis used by number of classes at each client’s project. The analysis was inspired by a previous post we’ve published in November 2014.
And what about the top tools Java developers use?
We’ve got you covered right here: The Top 15 Tools Java Developers Use After Major Releases
Seeing anything that we missed in the data? Please let us know in the comments section below.
Reference: | We Analyzed 60,678 Libraries on Github – Here are the Top 100 from our JCG partner Alex Zhitnitsky at the Takipi blog. |
Looked at your source of what you are testing for ‘groovy’ and that will pop up for dependency reports or other things (if they are included in the project commit. Are you just testing for ‘imports’ at a ROOT level?? Because these would just be ‘Scripts’ then or ‘Groovy Scripts’ :)
Alternately you could do this using mobile data but this may get expensive if
you are not on a data strategy with your operator.