Development Horror Story – Release Nightmare
Everyone has good stories about releases that went wrong, right? I’m no exception and I have a few good ones under my development career. These are usually very stressful at the time, but now me and my teammates can’t talk about these stories without laughing.
History
I think this happened around 2009. Me and my team had to maintain a medium to large legacy web application with around 500 k lines of code. This application was developed by another company, so we didn’t have the code. Since we were in charge now and needed the code to maintain it, they handed us the code in a zip file (first pointer that something was wrong)!
Their release process was peculiar to say the least. I’m pretty sure there are worst release procedures out there. This one consisted in copying the changed files (*.class, *.jsp, *.html, etc) to an exploded war folder on a Tomcat server. We also had three environments (QA, PRE, PROD) with different application versions and no idea which files were deployed on each. They also had a ticket management application with attached compiled files, ready to be deployed and no idea of the original sources. What could possibly go wrong here?
The Problem
Our team was able to make changes required by the customer and push them to PROD servers. We have done it a few times successfully, even with all the handicaps. Everything was looking good until we got another request for additional changes. These changes were only a few improvements in the log messages of a batch process. The batch purpose was to copy files sent to the application with financial data input to insert into a database. I guess that I don’t have to state the obvious: this data was critical to calculate financial movements with direct impact on the amounts paid by the application users.
After our team made the changes and perform the release, all hell went loose. Files were not being copied to the correct locations. Several data duplicated in the database and the file system. Financial transactions with incorrect amounts. You name it. A complete nightmare. But why? The only change was a few improvements in the log messages.
The Cause
The problem was not exactly related with the changed code. Look at the following files:
BatchConfiguration
public class BatchConfiguration { public static final String OS = "Windows"; }
And:
public class BatchProcess { public void copyFile() { if (BatchConfiguration.OS.equals("Windows")) { System.out.println("Windows"); } else if (BatchConfiguration.OS.equals("Unix")) { System.out.println("Unix"); } } public static void main(String[] args) { new BatchProcess().copyFile(); } }
This is not the real code, but for the problem purposes it was laid out like this. Don’t ask me about the why it was like this. We got it in the zip file, remember?
So we have here a variable which sets the expected Operating System and then the logic to copy the file is dependant on this. The server was running on a Unix box so the variable value was Unix
. Unfortunately, all the developers were working on Windows boxes. I said unfortunately, because if the developer that implemented the changes was using Unix, everything would be fine.
Anyway, the developer changed the variable to Windows
so he could proceed with some tests. Everything was fine, so he performs the release. He copied the resulting BatchProcess.class
into the server. He didn’t bother about the BatchConfiguration
, since the one on the server was configured to Unix
right?
Maybe you already spotted the problem. If you haven’t, try the following:
- Copy and build the code.
- Execute it. Check the output, you should get
Windows
. - Copy the resulting
BatchProcess.class
to an empty directory. - Execute this one again. Use command line
java BatchProcess
What happened? You got the output Windows
, right?. Wait! We didn’t have the BatchConfiguration.class
file in the executing directory. How is that possible? Shouldn’t we need this file there? Shouldn’t we get an error?
When you build the code, the java compiler will inline the BatchConfiguration.OS
variable. This means that the compiler will replace the variable expression in the if statement with the actual variable value. It’s like having if ("Windows".equals("Windows"))
Try executing javap -c BatchProcess
. This will show you a bytecode representation of the class file:
BatchProcess.class
public void copyFile(); Code: 0: ldc #3 // String Windows 2: ldc #3 // String Windows 4: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 7: ifeq 21 10: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream; 13: ldc #3 // String Windows 15: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 18: goto 39 21: ldc #3 // String Windows 23: ldc #7 // String Unix 25: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 28: ifeq 39 31: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream; 34: ldc #7 // String Unix 36: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 39: return
You can confirm that all the variables are replaced with their constant values.
Now, returning to our problem. The .class file that was copied to the PROD servers had the Windows
value set in. This messed everything in the execution runtime that handled the input files with the financial data. This was the cause of the problems I’ve described earlier.
Aftermath
Fixing the original problem was easy. Fixing the problems caused by the release was painful. It involved many people, many hours, pizza, loads of SQL queries, shell scripts and so on. Even our CEO came to help us. We called this the mUtils problem, since it was the original java class name with the code.
Yes, we migrated the code to something manageable. It’s now on a VCS with a tag for every release and version.
Reference: | Development Horror Story – Release Nightmare from our JCG partner Roberto Cortez at the Roberto Cortez Java Blog blog. |