Using a memory mapped file for a huge matrix
Matrices can be really large, sometimes larger than you can hold in one array. You can extend the maximum size by having multiple arrays however this can make your heap size really large and inefficient. An alternative is to use a wrapper over a memory mapped file. The advantage of memory mapped files is that they have very little impact on the heap and can be swapped in and out by the OS fairly transparently.
A huge matrix
This code supports large matrices of double. It partitions the file into 1 GB mappings. (As Java doesn’t support mappings of 2 GB or more at a time, a pet hate of mine ;)
import sun.misc.Cleaner; import sun.nio.ch.DirectBuffer; import java.io.Closeable; import java.io.IOException; import java.io.RandomAccessFile; import java.nio.MappedByteBuffer; import java.nio.channels.FileChannel; import java.util.ArrayList; import java.util.List; public class LargeDoubleMatrix implements Closeable { private static final int MAPPING_SIZE = 1 << 30; private final RandomAccessFile raf; private final int width; private final int height; private final List mappings = new ArrayList(); public LargeDoubleMatrix(String filename, int width, int height) throws IOException { this.raf = new RandomAccessFile(filename, "rw"); try { this.width = width; this.height = height; long size = 8L * width * height; for (long offset = 0; offset < size; offset += MAPPING_SIZE) { long size2 = Math.min(size - offset, MAPPING_SIZE); mappings.add(raf.getChannel().map(FileChannel.MapMode.READ_WRITE, offset, size2)); } } catch (IOException e) { raf.close(); throw e; } } protected long position(int x, int y) { return (long) y * width + x; } public int width() { return width; } public int height() { return height; } public double get(int x, int y) { assert x >= 0 && x < width; assert y >= 0 && y < height; long p = position(x, y) * 8; int mapN = (int) (p / MAPPING_SIZE); int offN = (int) (p % MAPPING_SIZE); return mappings.get(mapN).getDouble(offN); } public void set(int x, int y, double d) { assert x >= 0 && x < width; assert y >= 0 && y < height; long p = position(x, y) * 8; int mapN = (int) (p / MAPPING_SIZE); int offN = (int) (p % MAPPING_SIZE); mappings.get(mapN).putDouble(offN, d); } public void close() throws IOException { for (MappedByteBuffer mapping : mappings) clean(mapping); raf.close(); } private void clean(MappedByteBuffer mapping) { if (mapping == null) return; Cleaner cleaner = ((DirectBuffer) mapping).cleaner(); if (cleaner != null) cleaner.clean(); } } public class LargeDoubleMatrixTest { @Test public void getSetMatrix() throws IOException { long start = System.nanoTime(); final long used0 = usedMemory(); LargeDoubleMatrix matrix = new LargeDoubleMatrix("ldm.test", 1000 * 1000, 1000 * 1000); for (int i = 0; i < matrix.width(); i++) matrix.set(i, i, i); for (int i = 0; i < matrix.width(); i++) assertEquals(i, matrix.get(i, i), 0.0); long time = System.nanoTime() - start; final long used = usedMemory() - used0; if (used == 0) System.err.println("You need to use -XX:-UseTLAB to see small changes in memory usage."); System.out.printf("Setting the diagonal took %,d ms, Heap used is %,d KB%n", time / 1000 / 1000, used / 1024); matrix.close(); } private long usedMemory() { return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(); } }
With the following test, which writes to each of the diagonal values of a one million * one million matrix. This is too large to hope to create on the heap.
Setting the diagonal took 314,819 ms, Heap used is 2,025 KB $ ls -l ldm.test -rw-rw-r-- 1 peter peter 8000000000000 2011-12-30 12:42 ldm.test $ du -s ldm.test 4010600 ldm.test
That’s 8,000,000,000,000 bytes or ~7.3 TB in virtual memory, in a Java process! This works because it only allocates or pages in the pages which you use. So while the file size is almost 8 TB, the actual disk space and memory used is 4 GB.
With a more modest file size of 100K * 100K matrix you see something like the following. Its still an 80 GB matrix, which uses trivial heap space. ;)
Setting the diagonal took 110 ms, Heap used is 71 KB $ ls -l ldm.test -rw-rw-r-- 1 peter peter 80000000000 2011-12-30 12:49 ldm.test $ du -s ldm.test 400000 ldm.test
Reference: Using a memory mapped file for a huge matrix from our JCG partner Peter Lawrey at the Vanilla Java blog
Related Articles :
Hey Peter! Thanks for this article. Could you please tell me what type of file are you passing to the LargeMatrix class? I want to pass a file that is of the format ‘i, j, value’ where i and j are the indices of the matrix and ‘value’ is the value I want to store in that location.
@jamheart I would just store the value and calculate the location. You only need to store the i and j or x and y if you have a sparse matrix.