Creating Vulnerability Assessment Artifacts Using Maven Assembly
This article will discuss using Maven Assembly to create artifacts that can be provided to third-party vulnerability assessment sites (e.g., Veracode) for review.
Static Analysis for Bugs vs. Vulnerability Assessments
At this point everyone is aware of findbugs and uses it religiously, right?
Right?
Findbugs uses static analysis to find bugs. More precisely, it uses static analysis to find bugs that can be found by static analysis. For instance I’ve seen a common pattern of
public void foo(Object obj) { if (obj != null) { obj.doSomething(); } // lots of obscuring code obj.doSomethingElse() }
Should we check for null a second time? Did we need to check the first time? Should we have returned from the ‘if’ clause?
Why do we also need vulnerability assessment?
What is vulnerability assessment? How is it different from bugs?
The key concept is that vulnerable code is superficially bug-free but is still vulnerable to misuse to attack this site or its users.
An example of vulnerable code is using unsanitized user-provided values. Anyone working on the front-end should know the importance of sanitizing these values.
But what happens when user-provided data is passed out of the front-end, e.g., when it’s written to the database? Will everyone who pulls data from the database know that it might contain unsanitized user-provided data? What about malicious data put into the database via SQL injection?
Static analysis for vulnerability assessment looks a lot like static analysis to find bugs, just a lot more through. Whereas findbugs may take 5 minutes to run Veracode may take a few hours!
(Dynamic analysis takes this a step further and runs the tests against a live system. You can do a light version of this using integration tests.)
Artifacts for Vulnerability Assessment
What do we need to provide for vulnerability assessments? The short answer is three things:
- our compiled code (e.g., java or scala)
- our scripted code (e.g., jsp)
- every jar file we depend on, recursively
We don’t need to provide our source code or resources. The compiled code does need to include debug systems so it can give meaningful error messages – knowing only that there’s 19 defects in a library containing 79 classes isn’t very helpful!
A good format is a tarball containing:
- our jar and wars at the top level, sans version number
- our dependencies under “/lib”, with version number
The version numbers are stripped or retained for tracking purposes. Our code has a continuity across multiple runs. Our dependencies can change at any time and don’t have any continuity beyond what’s explicitly indicated in the version numbers.
Our war files should be stripped of embedded jars since they’ll be present under the ‘lib’ directory. “Thick” war files just increase the size of the uploaded artifact.
We can build this with two maven assembly descriptors.
va-war.xml (vulnerability assessment skinny war)
The first assembly creates a stripped down .war file. I don’t want to call it a skinny war since the intended purpose is different but they have a lot in common.
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://maven.apache.org/plugins/maven/assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-1.1.2.xsd"> <id>va-war</id> <formats> <format>war</format> </formats> <includeBaseDirectory>false</includeBaseDirectory> <fileSets> <!-- grab everything except any jars --> <fileSet> <directory>target/${project.artifactId}-${project.version}</directory> <outputDirectory>/</outputDirectory> <includes /> <excludes> <exclude>**/*.jar</exclude> </excludes> </fileSet> </fileSets> </assembly>
You can exclude additional files if you have sensitive information or a lot of large artifacts:
<excludes> <exclude>**/*.jar</exclude> <exclude>**/*.jks</exclude> <exclude>**/*.p12</exclude> <exclude>**/*.jpg</exclude> <exclude>**/db.properties</exclude> </excludes>
You need to be careful though – you need to include anything that’s scripted, e.g., jsp files or velocity templates.
va-artifact.xml (vulnerability assessment artifact)
The second artifact collects all of the dependencies and stripped down wars into a single tarball. Our jars and wars are at the top level of the tarball, all dependencies are in a ‘lib’ directory. This makes it easy to distinguish between our artifacts and our dependencies.
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://maven.apache.org/plugins/maven/assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-1.1.2.xsd"> <id>va-artifact</id> <formats> <format>tar.gz</format> </formats> <includeBaseDirectory>false</includeBaseDirectory> <dependencySets> <!-- ******************************************* --> <!-- Our code should not include version numbers --> <!-- ******************************************* --> <dependencySet> <includes> <include>${project.groupId}:*:jar</include> <include>${project.groupId}:*:va-war</include> <!-- we could also include subprojects --> <include>${project.groupId}.**:*:jar</include> </includes> <!-- we might have sensitive resources --> <excludes> <exclude>${project.groupId}:*-properties</exclude> <excludes> <outputFileNameMapping>${artifact.artifactId}${dashClassifier?}.${artifact.extension}</outputFileNameMapping> </dependencySet> <!-- *********************************************** --> <!-- Our dependencies should include version numbers --> <!-- *********************************************** --> <dependencySet> <outputDirectory>lib</outputDirectory> <includes /> <excludes> <exclude>${project.groupId}:*</exclude> <exclude>*.pom</exclude> <!-- exclude standard APIs --> <exclude>javax.*:*</exclude> <exclude>dom4j:*</exclude> <exclude>jaxen:*</exclude> <exclude>jdom:*</exclude> <exclude>xml-apis:*</exclude> </excludes> </dependencySet> </dependencySets> </assembly>
Building the Artifacts
The assembly descriptors are only half of the story. We still need to call maven assembly and we do not want to do it for every build.
This is an ideal time for profiles – we will only build artifacts when a specific profile is specified.
pom.xml for war modules
The necessary addition to the pom.xml file for war modules is modest. We need to call our assembly descriptor but we don’t need to explicitly add dependencies.
<profiles> <profile> <id>vulnerability-assessment</id> <build> <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptors> <descriptor>src/main/assembly/va-war.xml</descriptor> </descriptors> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </profile> </profiles>
pom.xml for top-level modules
The necessary addition to the pom.xml file for the top-level module is more complex, especially when the distribution assembly is created in a submodule instead of the root module. In this case we need to explicitly add a dependency on both pom files and lite war files. If we don’t specify the former we’ll lose most dependencies, if we don’t specify the latter we’ll lose the .war files.
<profiles> <profile> <id>vulnerability-assessment</id> <build> <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptors> <descriptor>src/main/assembly/va-artifact.xml</descriptor> </descriptors> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> <dependencies> <!-- specify parent pom --> <dependency> <groupId>${project.groupId}</groupId> <artifactId>parent</artifactId> <!-- FIXME --> <version>${project.version}</version> <type>pom</type> </dependency> <!-- specify each war file and corresponding pom file --> <dependency> <groupId>${project.groupId}</groupId> <artifactId>webapp-1</artifactId> <!-- FIXME --> <version>${project.version}</version> <type>war</type> <classifier>va-war</classifier> </dependency> <dependency> <groupId>${project.groupId}</groupId> <artifactId>webapp-1</artifactId> <!-- FIXME --> <version>${project.version}</version> <type>pom</type> </dependency> <!-- second... --> <dependency> <groupId>${project.groupId}</groupId> <artifactId>webapp-2</artifactId> <!-- FIXME --> <version>${project.version}</version> <type>war</type> <classifier>va-war</classifier> </dependency> <dependency> <groupId>${project.groupId}</groupId> <artifactId>webapp-2</artifactId> <!-- FIXME --> <version>${project.version}</version> <type>pom</type> </dependency> <!-- and so on... --> </dependencies> </profile> </profiles>
One small gotcha!
There is one small gotcha! in this specific approach. It is possible that the individual web modules will have dependencies on different versions of common libraries. Nobody wants this but once projects reach a certain size you can’t afford the time and effort required to keep all of the modules in sync.
This information will be lost when we do dependency resolution at a common location.
I don’t consider this a problem for two reasons. First, we can perform vulnerability assessments at a finer granularity – essentially perform the analysis at the .war level instead of the .ear file. This guarantees the libraries will match but will tremendously increase our work load if we have a large number of web modules.
Second, our primary focus is the vulnerabilities in our code, not in specific versions of third-party libraries. Those libraries provide important hints to the assessment tools but we only want a full analysis of our code. We can always run separate assessments of the libraries we depend upon if it’s necessary.
Jenkins Veracode Plugin
Finally I want to point out that there’s a Jenkins plugin for Veracode analysis: Veracode Scanner Plugin. It can be used to schedule scans on a regular basis so you don’t find hundreds of defects when you finally remember to run a scan just days before a release.