We Analyzed 30,000 GitHub Projects – Here Are The Top 100 Libraries in Java, JS and Ruby
One of the biggest dilemmas developers face every day is which software libraries to use. Go with the hot new framework or the “boring” tried-and-tested one that’s been around for 10 years? One of the main things that make frameworks successful is their communities of users and contributors. While it can be easy to know how many people contribute to a project (especially if it’s open source), it’s pretty hard to know how many are actually using it. We decided to take a data-driven approach to answer these questions.
GitHub hosts more than a million projects today. Projects range from small utilities and test apps all the way to massive infrastructure projects with hundreds of contributors. As such, it provides a fairly diverse and up-to-date dataset to explore, one which is also indicative of the trends in closed-source and enterprise software.
We chose the 3 top languages on GitHub – Java, Ruby and JavaScript. For each one we analyzed 10,000 projects (i.e. GitHub repositories) leaning towards those that have been favorited the most by developers.
We analyzed what are the top 100 commonly used components, grouping them into categories (e.g. Testing, DB , UI, etc..). It’s pretty interesting to see how these differ between the different Languages.
Here are a some notable findings and the top 10 libraries for each language (you can find the full list at the bottom of this post):
Ruby
- SQL still dominates. While NoSQL databases are all the rage these days, relational databases (SQL) still dominate the Ruby world – Sqlite, postgreSQL, MySql are used in 25% of the projects, while Redis and mongo only appear in 3% of the projects.
- MongoDB is however still popular in Ruby with 185 entries, which is twice as much projects than in Java.
- In web development we see that while new frameworks have gained traction in the last few years (such as Sinatra with 570 entries), Ruby is still centered around Rails, with over 7,000 projects. For web servers, Thin (with 487 entries) is used by twice as many projects compared to Unicorn.
- CoffeeScript, a new language layer on top of JavaScript seems to be well received by Ruby web developers with over a 1000 projects.
- Twitter has also made a big impact in Ruby with 3 libraries in the top 100 and 382 projects using them. While, that’s pretty big, it’s still not not quite as big as Google’s influence on Java as we’ll see in a second.
JavaScript
- JS is fragmented. The top components’ reach in Java is 30% of projects. For Ruby it’s about 20%. For JS it’s not even 10%. As JavaScript is rapidly evolving to support more types of applications, a lot of new capabilities have not yet been absorbed into the language or standard libraries. As a result we see 50% more frameworks used in JavaScript than in Ruby and Java in the top 100, echoing that fact it’s still early days for the language.
- Grunt is huge. The Grunt automation framework plays a very big role in JS development (especially for node.js) with 23% of of top 100 libraries plugging-in to it. Grunt seems to be filling the gap in the build, testing and deployment cycle in JS. This is handled externally from the project in languages such as Java by other prominent tools such as Maven or Jenkins.
- Networking is still a big problem. A large part of JavaScript libraries (7% of the top 100) focus on networking and client/server communication. That’s 3X times more than in Java and Ruby. This is most likely due to web developers having to deal with a fragmented ecosystem on the browser side, and the relative early state of the server stack.
- For server-side web development – the express framework for node.js is leading the chart with 631 entries.
- Striving toward structure. JavaScript also features the largest number of language extensions with 844 entries. It’s interesting to see that while JavaScript is a very flexible language, developers are looking towards ways to mold it into something more structured. Underscore.js, which provides functional programming capabilities similar to those found in more structured languages such as Scala has 416 entries, making it the 5th most prevalent JS library.
Java
- It’s Guava season – Google code has gone mainstream. Spring and Apache libraries are so prevalent they’re practically a part of the language, with over 25% of the top 100 libraries split fairly evenly between the two. Something a bit surprising is the prevalence of Google made libraries, such as GWT and Guava, in Java, with 7% of the top 100. Seems like there’s one more area in our life which Google has a big part in.
- BigData – Hadoop is leading the chart. Data processing is a big part of Java with 16 of top 100 libraries focusing on database management, compared to 12 in Ruby and 5 in JavaScript (admittedly still a much more client side language).
- It’s interesting to see that Hadoop is living up to its promise as the leading big data technology with 168 entries. To put in perspective, MySql, one of the most well-known and common SQL DBs, has 225 entries. Postgre SQL, another well-known relational DB, has 121.
- ElasticSearch, a new technology for searching across large data sets, is also doing quite well on GitHub with over a 100 projects using it.
- Test driven development (TDD) is huge in Java and Ruby (still not in JS) – across all three languages we see testing play a very big role. In Java and Ruby, 40-50% of projects reviewed are using an automated testing framework. The leading ones being JUnit in Java and RSpec in Ruby. In JavaScript’s percentage of projects using a testing framework is considerably lower, coming in at 25%.
- Mocking, a method for simulating real world objects in testing and development, has gained a lot of traction with 10% of the projects in Java and 7% in Ruby applying it. In JavaScript mocking is still almost nonexistent.
Click here to see the complete top 100 libraries list.
Good stuff to keep a tab on. Thank you!