Caching Strategy Reminder for Maven-Based Docker Builds

Keyhole SoftwareJanuary 7th, 2015Last Updated: January 7th, 2015

0 63 3 minutes read

My local development feedback loop between code change and runnable container was annoyingly long on a Maven-based project I was recently working on. I wanted to speed things up.

The scenario was something like this:

touch/change some source code
docker build
maven downloads the world
maven compiles my project
docker run
touch/change some source code
docker build
maven downloads the world
maven compiles my project
docker run
touch/change some source code
docker build
maven downloads the world
maven compiles my project
docker run
…

I didn’t really enjoy the “maven downloads the world” steps, and wanted to minimize the number of times it needed to run.

Let’s follow along as I make my situation a little better. For illustration, we’ll start off with this generic archetype-created skeleton project:

package com.keyholesoftware.blog;
 
public class App
{
    public static void main( String[] args )
    {
        System.out.println( "Hello World!" );
    }
}

package com.keyholesoftware.blog;
 
import junit.framework.*;
 
public class AppTest extends TestCase
{
    public void testApp()
    {
        assertTrue( true );
    }
}

FROM maven:3.2.5-jdk-8u40
 
RUN mkdir --parents /usr/src/app
WORKDIR /usr/src/app
 
ADD . /usr/src/app
RUN mvn verify

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
 
  <groupId>com.keyholesoftware.blog</groupId>
  <artifactId>khs-docker-caching-blog</artifactId>
  <version>1.0-SNAPSHOT</version>
 
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Things aren’t that bad when I am building back-to-back, e.g.

$ docker build .
  ...
$ docker build .
  ...

Notice that the second build is fast as everything is cached up. But what about when we do something like this:

$ docker build .
  ...
$ touch src/main/java/com/keyholesoftware/blog/App.java
  ...
$ docker build .
  ...

Notice that the second build is unnecessarily slowed down by the redownload portion.

I sat around and despaired for a while until I remembered the tricks I’ve seen with selective caching:

FROM maven:3.2.5-jdk-8u40
 
RUN mkdir --parents /usr/src/app
WORKDIR /usr/src/app
 
 
# selectively add the POM file
ADD pom.xml /usr/src/app/
# get all the downloads out of the way
RUN mvn verify clean --fail-never
 
 
ADD . /usr/src/app
RUN mvn verify

Let’s try that sequence again.

$ docker build .
 ...
$ touch src/main/java/com/keyholesoftware/blog/App.java
  ...
$ docker build .
  ...

Getting better, but there were still a few downloads going on during the second build. They are related to the surefire test/plugin. Actually this process will help us iron out downloads which are chosen dynamically, and lock those down. In this case, we lock down our surefire provider.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
 
  <groupId>com.keyholesoftware.blog</groupId>
  <artifactId>khs-docker-caching-blog</artifactId>
  <version>1.0-SNAPSHOT</version>
 
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
 
  <properties>
    <surefire.version>2.8.1</surefire.version>
  </properties>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>${surefire.version}</version>
        <!-- lock down our surefire provider -->
        <dependencies>
          <dependency>
            <groupId>org.apache.maven.surefire</groupId>
            <artifactId>surefire-junit3</artifactId>
            <version>${surefire.version}</version>
          </dependency>
        </dependencies>
      </plugin>
    </plugins>
  </build>
 
</project>

Let’s try that sequence again.

$ docker build .
  ...
$ touch src/main/java/com/keyholesoftware/blog/App.java
  ...
$ docker build .
  ...

So now, unless we change the POM, we don’t have to redownload anything. Nice.

Now the scenario is something like this:

touch/change some source code
docker build
maven downloads the world
maven compiles my project
docker run
touch/change some source code
docker build
maven compiles my project
docker run
touch/change some source code
docker build
maven compiles my project
docker run
…

Notice the “maven downloads the world” step only happens once (unless I actually change the POM, of course).

Final Thoughts

There might be better ways to handle some of this (e.g. dependency:resolve/resolve-plugin but that doesn’t seem to work as thoroughly, and probably something with fig), but I mainly wanted to highlight a possible use of the selective adding/caching.

Other Notes:

For you Ruby+Rakefile, Python+requirements.txt, Node+package.json, Go+GoDeps.json etc. folks — Maven doesn’t have an explicit ‘install dependencies’ step. See Introduction to the Build Lifecycle if you’re bored.
For you Gradle folks, I haven’t used Gradle much. What are your thoughts?
The source code for this post is at: https://github.com/in-the-keyhole/khs-docker-caching-blog

Thanks for reading!

Reference:

Caching Strategy Reminder for Maven-Based Docker Builds from our JCG partner Luke Patterson at the Keyhole Software blog.