Caching Strategy Reminder for Maven-Based Docker Builds
My local development feedback loop between code change and runnable container was annoyingly long on a Maven-based project I was recently working on. I wanted to speed things up.
The scenario was something like this:
- touch/change some source code
docker build
- maven downloads the world
- maven compiles my project
docker run
- touch/change some source code
docker build
- maven downloads the world
- maven compiles my project
docker run
- touch/change some source code
docker build
- maven downloads the world
- maven compiles my project
docker run
- …
I didn’t really enjoy the “maven downloads the world” steps, and wanted to minimize the number of times it needed to run.
Let’s follow along as I make my situation a little better. For illustration, we’ll start off with this generic archetype-created skeleton project:
1 2 3 4 5 6 7 8 9 | package com.keyholesoftware.blog; public class App { public static void main( String[] args ) { System.out.println( "Hello World!" ); } } |
01 02 03 04 05 06 07 08 09 10 11 | package com.keyholesoftware.blog; import junit.framework.*; public class AppTest extends TestCase { public void testApp() { assertTrue( true ); } } |
1 2 3 4 5 6 7 | FROM maven:3.2.5-jdk-8u40 RUN mkdir --parents /usr/src/app WORKDIR /usr/src/app ADD . /usr/src/app RUN mvn verify |
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 | < project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" < modelVersion >4.0.0</ modelVersion > < groupId >com.keyholesoftware.blog</ groupId > < artifactId >khs-docker-caching-blog</ artifactId > < version >1.0-SNAPSHOT</ version > < dependencies > < dependency > < groupId >junit</ groupId > < artifactId >junit</ artifactId > < version >3.8.1</ version > < scope >test</ scope > </ dependency > </ dependencies > </ project > |
Things aren’t that bad when I am building back-to-back, e.g.
1 2 3 4 | $ docker build . ... $ docker build . ... |
Notice that the second build is fast as everything is cached up. But what about when we do something like this:
1 2 3 4 5 6 | $ docker build . ... $ touch src /main/java/com/keyholesoftware/blog/App .java ... $ docker build . ... |
Notice that the second build is unnecessarily slowed down by the redownload portion.
I sat around and despaired for a while until I remembered the tricks I’ve seen with selective caching:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 | FROM maven:3.2.5-jdk-8u40 RUN mkdir --parents /usr/src/app WORKDIR /usr/src/app # selectively add the POM file ADD pom.xml /usr/src/app/ # get all the downloads out of the way RUN mvn verify clean --fail-never ADD . /usr/src/app RUN mvn verify |
Let’s try that sequence again.
1 2 3 4 5 6 | $ docker build . ... $ touch src /main/java/com/keyholesoftware/blog/App .java ... $ docker build . ... |
Getting better, but there were still a few downloads going on during the second build. They are related to the surefire test/plugin. Actually this process will help us iron out downloads which are chosen dynamically, and lock those down. In this case, we lock down our surefire provider.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | < project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" < modelVersion >4.0.0</ modelVersion > < groupId >com.keyholesoftware.blog</ groupId > < artifactId >khs-docker-caching-blog</ artifactId > < version >1.0-SNAPSHOT</ version > < dependencies > < dependency > < groupId >junit</ groupId > < artifactId >junit</ artifactId > < version >3.8.1</ version > < scope >test</ scope > </ dependency > </ dependencies > < properties > < surefire.version >2.8.1</ surefire.version > </ properties > < build > < plugins > < plugin > < groupId >org.apache.maven.plugins</ groupId > < artifactId >maven-surefire-plugin</ artifactId > < version >${surefire.version}</ version > <!-- lock down our surefire provider --> < dependencies > < dependency > < groupId >org.apache.maven.surefire</ groupId > < artifactId >surefire-junit3</ artifactId > < version >${surefire.version}</ version > </ dependency > </ dependencies > </ plugin > </ plugins > </ build > </ project > |
Let’s try that sequence again.
1 2 3 4 5 6 | $ docker build . ... $ touch src /main/java/com/keyholesoftware/blog/App .java ... $ docker build . ... |
So now, unless we change the POM, we don’t have to redownload anything. Nice.
Now the scenario is something like this:
- touch/change some source code
docker build
- maven downloads the world
- maven compiles my project
docker run
- touch/change some source code
docker build
- maven compiles my project
docker run
- touch/change some source code
docker build
- maven compiles my project
docker run
- …
Notice the “maven downloads the world” step only happens once (unless I actually change the POM, of course).
Final Thoughts
There might be better ways to handle some of this (e.g. dependency:resolve/resolve-plugin but that doesn’t seem to work as thoroughly, and probably something with fig), but I mainly wanted to highlight a possible use of the selective adding/caching.
Other Notes:
- For you Ruby+Rakefile, Python+requirements.txt, Node+package.json, Go+GoDeps.json etc. folks — Maven doesn’t have an explicit ‘install dependencies’ step. See Introduction to the Build Lifecycle if you’re bored.
- For you Gradle folks, I haven’t used Gradle much. What are your thoughts?
- The source code for this post is at: https://github.com/in-the-keyhole/khs-docker-caching-blog
Thanks for reading!
Reference: | Caching Strategy Reminder for Maven-Based Docker Builds from our JCG partner Luke Patterson at the Keyhole Software blog. |