The Definitive Guide to Coding Style Standards
I’ve been rereading Jeff Atwood (Coding Horror) and Joel Spolsky (Joel on Software) and came across a discussion of coding style standards. It reminded me of the blog entry I was going to write “some day”.
Today is “some day”.
First I should ruin the joke by pointing it out – this is the “definitive guide” since nobody ever seems to think that they just have some ideas but are open to other approaches. Everyone has the answer. However in this case the emphasis is on the process of deciding a standard, not the precise details of the selected standard. I’ve found their arguments compelling and I have personal experience after applying them but I’ll never say I have the final answer. Technology changes things, e.g., the syntax colorers in modern IDEs.
It’s 10 pm and your dinner came from the vending machine
Think about doing a task at work. What time is it? Around 10 am when you’re still fresh? Or perhaps 2 pm when you might have a bit of the afternoon doldrums but are still relaxed after lunch and focused on work?
Nobody ever thinks about what life is like when it’s 10 pm, your dinner came from the vending machine and you have no idea when you can finally go home. That’s the most critical time when it’s easy to make stupid mistakes and hard to find them! Good standards might not stop you from making stupid mistakes but they should give bad code a distinct code smell that makes them easier to find.
Always keep this in mind – coding standards, requirements for minimum unit test coverage and the like should not be adopted because somebody thinks the code should look pretty or they like green on their CI screens. They’re adopted because they make it easier to produce good code under stressful conditions.
Why coding style standards matter
The one-word answer: focus.
A developer’s most critical resource is focus. It’s finite and easily exhausted. Worse, the effects are non-linear. Tight focus, a clear goal and no distractions means you can get into a state of “flow” and be incredibly productive. It means you can write code and simultaneously think about how you can test it and how to reduce the amount of work you have to do in the future.
But add a surprisingly small amount of distraction – the blank line at the end of a conditional block that feels like missing a step on the stairs (or worse, panicked that you accidently deleted a line of code) or similar variable names (m_x and m_y) and you’re knocked out of flow. You’re writing code that works but isn’t easily extensible. Crank up the distractions and you write some code that duplicates something already in a standard library, or you overlook a border case.
10 PM Examples
As a first example consider
if (name.isEmpty()) { ... }
and
if (name.length() == 0) { ... }
There isn’t much difference at 10 am. If you’re tired the first makes it explicit that you’re checking for an empty string. The second takes a bit of focus to see that you’re checking a special value (zero), that it’s not something that looks similar (0 vs 8), and you have to make the slight mental shift from the test (numeric) to semantic (it’s an empty string).
One line isn’t going to make much of a difference but a dozen times in less than five minutes? Each ding might only cost you 0.2% of your focus but that 0.2% quickly adds up.
Consistency, consistency, consistency!
Any standard is better than no standard, but it’s even more important to fully embrace it. Nothing is more distracting (i.e., loses focus) than hitting a block of code that doesn’t follow the same pattern as everything else.
On the other hand a lack of consistency can be used for fun. I was once brought in on a short contract to work on some code that didn’t have any active developers. It was clearly written by three people who each had their own style and not much respect for the others. (Exactly how many linked list implementations do you need in an app with 10k lines of code? Naturally they handled the border cases slightly differently.) I mentioned my observations about the prior developers to a coworker on a sister project and seriously freaked him out because I was able to describe people who had left the company over a year earlier.
Important: the key point here is consistency. It’s far better for the code to consistently follow a horrid style than to be a mishmash of styles.
Automate
Standards don’t mean anything when you never run a formatter. You need to automate it.
Modern IDEs should provide a way to automatically format code as the file is saved. In Eclipse it’s in Preferences > Java > Code Editor > Save Actions. I use this and it guarantees that my files are always properly formatted. (If I’m making lots of changes I’ll also explicitly call the formatter in the editor.)
Another approach is to have an CI job that periodically reformats everything. It can check out the code, reformat it, run the tests, and if successful commit the code. This can be done nightly, but should definitely be done at least once a week.
A seriously bad approach is to use the hooks in your source control system to format the code as its checked in. It’s okay to use those hooks to verify that the code is properly formatted and reject it if not, but you never want the source control system to modify the code.
Optimize the right thing
This is a common problem with multilingual sites. They have a coding style standard and apply it to all languages.
No, no, no!
It might be a little easier to document a single style but see my comments about about fatigued developers. I should be able to glance at code and immediately tell you if it’s in C/C++, java, scala, javascript, whatever, no matter how tired I am. I shouldn’t have to look at the functions/methods used, I shouldn’t have to scroll the editor, I should know everything I need at a glance even if my eyes are blurry.
Why does this matter? Consider, oh, memory management. C/C++ needs to explicitly release objects, java/scala does not. (C++ destructors can help but you still need to explicit deallocate many things.)
10 PM Example
This is a thin thread but consider two styles:
void method() { Book book = new Book(); if (test()) { doSomething(book) } }
and
void method() { Book *book = (Book *) malloc(sizeof Book); if (test()) { do_something(book); } free(book); }
With distinct styles I can see the braces and be subconsciously reminded that I should check for a malloc(), and then a matching free() if I find one. With identical styles for java and C I’ll need to look a lot more deliberately, esp. with package-private methods that don’t have a visibility modifier.
It helps even more when the syntax colorer uses different colors.
Don’t code like it’s 1999
In 1999 I used ‘vi’ and had to keep track of everything. I never fell into practices like using a “m_” prefix for object fields although I know others did. (“m” is for member)
In 2014 we’ve had IDEs for years. These come with several important tools.
Syntax Coloring
With syntax coloring it’s obvious at a glance whether a value is a class static field, an class object field, or a parameter. E.g., in eclipse a static field is bold, an object field is blue and a parameter is black. All the prefix does is waste a little bit of focus to determine what the field actually is, especially in methods that don’t have a similarly named parameters.
Compare
public double getNewX(double theta) { return x*sin(theta) + y*cos(theta); }
with
public double getNewX(double theta) { return m_x*Math.sin(theta) + m_y*Math.cos(theta); }
(I cheated by adding a static import of Math methods but it follows the same spirit!)
The blog doesn’t show it but an IDE will probably show the ‘x’, ‘sin’ and ‘theta’ in different colors. You’ll immediately know which are variables and where they’re declared, etc.
Integrated Compiler Warnings
With integrated compiler warnings we’re immediately warned of questionable code like
public void setName(String name) { name = name; }
There are two correct implementations. For reasons mentioned earlier I prefer the former approach.
public void setName(String name) { this.name = name; }
and
public void setName(String name) { m_name = name; }
Language Features
Finally languages now support keywords like ‘final’ and ‘const’. Again it might take a little time to get used to it but this elevates the compiler warnings to compiler errors.
Java:
public void setName(final String name) { this.name = name; }
C:
void setName(const char* name) { self.name = name; }
Color Laser Printer
Inkjet printers are good for photos. Copier-based printers are good for speed. But if you’re a developer you need a color laser printer. You will never go back. The text is crisp (so you can distinguish between a ’0′ and ‘O’ or ‘l’ and ’1′) and it doesn’t smear (so you can continue to distinguish between them). They’re also cheap – I bought a duplex HP with maxed out memory for $600 for home about 6 months ago… and Costco has annoyingly just put a similar model on sale for $300 at $150 off. (They’re probably clearing out inventory for a newer model.)
An apparent downside is toner cost – it will cost over $400 to refill my printer – but that’s an illusion since the cost-per-page is a fraction of the cost for inkjet printers. It just stings because it’s a once-per-year expense (for me) instead of a monthly expense.
I print a lot less today than I did a decade ago but when I do it makes a world of difference to have it in color on good paper than monochrome on bulk copier paper.
Follow the standards
Finally, follow the standards reasonably closely. There are two reasons for this.
The historical reason is that standards that have survived for a decade have survived review by a lot of smart people, a lot of people smarter than you. If they don’t see a problem with them why should you?
The contemporary reason is that developers should be reading a lot of code written by others. They should be seeing in searches on Stack Overflow, in articles on DZone and Java Code Geeks, while looking at open source libraries to see how others have solved the problem. Remember focus – it’s easier to integrate the new information if you eliminate the friction from things like different styles.
Conclusions
For what it’s worth for Java I personally prefer the Apache standards – it’s the original Java standards but with spaces instead of tabs and a line length of 120 characters. Arrays look like “char[] data”.
For C I’ll typically use the style shown above. Opening braces are on their own line and arrays look like “char data[]“.
If you roll your own you should automate the heavy lifting. No 80 page bibles! The things that can’t be automated should fit onto a single page – things like whether you use camel case (java) or underscores (C), whether you use Hungarian notation, etc. Remember that you want your languages to have distinct looks that are standard (or nearly so) for their respective languages!
Finally, don’t confuse coding style standards with coding standards. You should use ‘final’ and ‘const’ heavily because it creates better code, not because it’s the standard style.
Sidenote: Hungarian notation
mandatory reading: Making Wrong Code Look Wrong.
Recommended
Hungarian notation, as widely used in the Windows world, is wrong. Vile. Evil. I rip off its head and piss on its grave.
The original Hungarian notation, as Joel points out, is valuable. In the Java world a good place to use it is when dealing with user-provided values. Anything that comes from the user(*) has a ‘u’ prefix, for ‘unsafe’. Anything that’s been sanitized has an ‘s’ prefix, for ‘safe’. It’s tempting to make this optional but an ‘s’ prefix reminds you that some values could be unsafe. Normal camel case applies so it’s “uName”, not “uname”. Perl handles this well with ‘tainting’.
(I’ve been playing around with the idea of using annotations to do the same in Java but annotations are class-based, not object-based, so I’ve only been able to get as far as marking classes as possibly unsafe. There’s no real gain over simply implementing a marker interface.)
The “m_” prefix for object values? I think it was useful when people were first making the shift from C to C++ and our choice was ‘vi’ or ‘emacs’ in the mid-90s. Today object-oriented designs are so commonplace that I think it’s just clutter – it’s akin to somebody saying that “I would like to take a cruise to Alaska in the year 2015″ or “I started working here in the year 2012″. However this should defer to local conventions unless until you’re ready to do a mass edit.
(* “User” includes databases and files. Anything that could be manipulated by someone outside of the application.)