Detecting (write) failures when using memory mapped files in Java
Memory mapped files are a good and often overlooked tool. I won’t go into the details here on how they work (use the force Google Luke!), but I will quickly summarize their advantages:
- lazy loading and write caching provided by the OS (you don’t have to write your own and it’s a safe bet that the OS’s one is well performing)
- easy reading for complicated binary data (for example one which has all kind of relative offsets encoded in it)
- can be used as a very high performance IPC mechanism
- written to disk even if your process crashes (if the OS survives)
- very high speed writes because you don’t block (the asynchronous flush is provided by the OS) and you don’t need to enter kernel mode
However with all this asynchronicity I was left wondering: what happens in case of a disk failure? How can the OS notify your process that it failed to write to disk what you’ve written to memory?
A little bit of searching turned up the answers:
- Under Linux your process gets a SIGBUS when the OS tries to write the memory back to disk but fails
- Under Windows you get an EXCEPTION_IN_PAGE_ERROR error the next time you try to call an OS function on the file handle
Wanting to confirm the information I whipped up a quick test program, plugged a sacrificial USB drive into my laptop and did a couple of tests. The conclusions are:
- Sure enough Linux generates a SIGBUS and Java (OpenJDK 1.7.0_51-b00) has no handler for it, crashing the process:
# A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x00007f9bb5042396, pid=26654, tid=140306951444224 # # JRE version: OpenJDK Runtime Environment (7.0_51) (build 1.7.0_51-b00) # Java VM: OpenJDK 64-Bit Server VM (24.45-b08 mixed mode linux-amd64 compressed oops) # Problematic frame: # v ~StubRoutines::jlong_disjoint_arraycopy
On the upside you know that something went horribly wrong since your process ceased to be. On the downside you might not immediately way (unless you’ve read this post)
- Linux can also generate more “traditional” error condition if you try to flush the file for example:
Exception in thread "main" java.io.IOException: Input/output error at sun.nio.ch.FileDispatcherImpl.force0(Native Method) at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) at Main.main(Main.java:84)
- Windows only generates exceptions when you operate on the filehandle again (for example by flusing it – like in Linux – but also when creating new mappings – something not experienced in Linux):
Exception in thread "main" java.io.IOException: The volume for a file has been externally altered so that the opened file is no longer valid at sun.nio.ch.FileDispatcherImpl.size0(Native Method) at sun.nio.ch.FileDispatcherImpl.size(FileDispatcherImpl.java:96) at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:307) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at Main.main(Main.java:64)
Conclusion: memory mapped files are great – like a very sharp knife is great: you can do great things with them very quickly, but they can also cut your finger of. If you want to use memory mapped files because of the advantages they offer:
- be ready to crash. Have a plan for when it will happen (hot standby, warm standby, do nothing – these are all valid options, but decide in advance)
- if you want to be sure that the data is on the disk, flush it. When that returns you can (almost) be certain that the data is on disk (we won’t get into the wonderful world of disk / controller caches or virtualized servers here).
It is not clear for me when and how you get the error condition in the form of an exception if SIGBUS kills the process first hand. Could you explain?
The SIGBUS comes at a “random” moment, not necessarily when you access the memory mapped file (something along the lines of: if the file is still in the memory – courtesy of the kernels filesystem cache – you don’t get a SIGBUS. However if it was removed from the cache and you try to access it – and the kernel in turn tries to read it back from the source – you get a SIGBUS). You can get an exception under Linux if you “manipulate” the file (for example you try to flush it) while it still is in the… Read more »