Creating software for sysops – make sure you do not suck
Plumbr is all about detecting performance problems from within Java applications. Whether this application is residing in a desktop machine under developer’s desk or hidden in a production vault guarded by the Bastard Operator From Hell – does not matter. We have designed our software to cover both ends of the spectrum. Or so we thought.
Past few months have made us doubt about our wisdom. Something just didn’t feel right – as there was suddenly more friction between the users and the product. So we finally took the time to stand up and make sure if our thoughts correspond to the reality.
The results were staggering – on occasions it seems as if someone had deliberately designed certain aspects of our service with “getting even with the operations” in mind. I will illustrate this with some examples, most of which should be applicable to any B2B software company:
Installation. Installation has to be easy. What can be easier than clicking on the downloaded JAR file and pressing NEXT for a few times?
Wrong, especially when your software is intended to run in dark corners of the server room. Instead of Swing-based UI’s you should start thinking in terms of rpm -ivh yourpackage.rpm and its close relatives like dpkg or yum,
To make things worse, organizations use different tools to support their release processes. Before you can say “I got this package management thing covered” your support channel will be overrun. Requests to embed your installation into shell scripts, continuous integration tools, release management software or configuration management tools will be flocking in. And just when you think you have it covered there will be the next Ansible just around the corner requiring your attention.
License server. Many B2B solutions are licensed in capacity-limited formats. Whether a particular piece of software is licensed by the number of users, server capacity by transaction volume does not matter that much. Your enterprise level clients wish to have transparency and control over their licensing deals. So you toss in your own custom-built licensing server.
Only to discover hate mail filling your inbox the day after releasing your license server. Apparently installing 80 different license servers from 80 different vendors is not something ops people are too eager to deal with. So before designing your proprietary solution, take a look at common licensing formats and making sure your licensing solution can easily be integrated into corporate licensing solutions.
API support. Of course you need to provide an API to your service. What could be a more appropriate than publish your interfaces as MBeans?
Well, if you bother to look up from your Java developer seat then you might again be surprised. JMX is about as well-known and widely used as Microsoft’s WebTV. But give operations an API they actually can and want to use and you will discover them using your product in ways you did not even think about. Publishing alerts created by your service to instant messaging solutions or embedding the statistical information into custom built company-wide dashboards. I bet you did not think about that.
The key here is also in publishing raw data. Apparently operations want the freedom to aggregate their data themselves.
Rolling out updates. Notifying users when a new version of your software is out and recommending an upgrade – this should be a no-brainer, right?
Well, try to deploy your software to 100’s of people within a single organization and wait for the next update. Voilà, you have just summoned most of them to contact helpdesk on the same day asking about these upgrades. Toss in frequent upgrades and you have created a nightmare for your customer.
Packaging. If you are born as a -javaagent, then it should be obvious. You want to live your life packaged as a JAR file and if lucky, get promoted to an IDE plugin.
If you somehow scrolled over the installation section above, then let me repeat the key take-away: your solution must be embeddable to different tools and processes. You can and you will have your own face, but do not be surprised being bundled into other tools and solutions.
Logging. It shouldn’t really matter, does it? log4j, slf4j, logback – pick one and be happy with your pick. Unless you are Ceki Gülcü, you could actually just roll a dice and be done with it?
Nope. Operations seem to have a sweet tooth to control what, when and how is being logged. Some of them just wish to log everything just in case and toss it to Logstash for future pattern detection analysis. And then there are guys who wish to add tampering-proof certification to their logs. And then the security audit team steps in to make sure that you do not log any financial or otherwise sensitive information.
Why we din’t discover this sooner? The roots of the discoveries covered in this blog post are hidden deep inside the company genome. Plumbr founders have a strong background in software development. We have all had our sleepless nights trying to pull yet another milestone together. So we kind-of know the mindset, tools and problems software developers are dealing with. But so far, we have not had anyone with operations background on board.
Another potential reason for the problem to surface only now is the change we see in the product usage patterns. During the past two quarters customers have discovered they are getting more out of the product when they attach it to their production environments. This allows Plumbr to constantly monitor the evolution of their software and discover potential performance bottlenecks. Naturally, this means that our target audience started shifting more and more from developers to ops. We acknowledged the findings, but did not really verify with the actual target audience.
Summary. For you, our reader – know thy users. The way they work, the tools they use, the way they think about the problem and the situation where they would actually use your solution. Techniques for this are not in scope of this post, but if you are not actively engaging with your end users and do not understand their work process, day-to-day problems or tools: stop. The world is already full of crappy software, take a step back and make sure you are not on the path of creating yet another one.
For us – we have definitely learned a lot from you, dear friends from operations. Even though it might take some time, we are already and will be rolling out improvement releases constantly. And I can only promise that those built keeping the lessons learned in mind.