Project Jigsaw: an incomplete puzzle
Mark Reinhold just recently proposed a delay of Java 9 to buy more time for completing project Jigsaw as the major feature of the upcoming release. While this decision will surely bring the doomsayers of Java back onto stage, I am personally quite relieved and think this was a good and necessary decision. The milestone for feature completion of Java 9 is currently set to the 10th of December, forbidding the introduction of new functionality after that date. But looking at early access builds of project Jigsaw, Java’s module system does not seem to be ready for this development stage.
Delays in project Jigsaw have become a habit over the latest Java release cycles. This must not be misinterpreted as incompetence but rather as an indicator for how difficult it is to introduce modules to Java that is currently a stranger to true modularization. Initially, the module system for Java was proposed in 2008 for inclusion in Java 7. But until today, Jigsaw’s implementation always turned out to be more difficult than anticipated. And after several suspensions and even a temporary abandonment, the stewards of Java are surely under pressure to finally succeed. It is great to see that this pressure did not push the Java team to rush for a release.
In this article, I try to summarize the state of project Jigsaw as I see it and as they were discussed publicly on the Jigsaw mailing list. I am writing this article as a contribution to the current discussion and to hopefully involve more people into the ongoing development process. I do no intend to downplay the hard work done by Oracle. I am stating this explicitly to avoid misinterpretation after the rather emotional discussions about Jigsaw following the concealment of sun.misc.Unsafe
.
Modularized reflection
What exactly is it that makes project Jigsaw such a difficult endeavor? Today, visibility modifiers are the closest approximation to encapsulating a class’s scope. Package-privacy can serve as an imperfect retainer of a type to its package. But for more complex applications that span internal APIs over multiple packages, visibility modifiers are insufficient and true modules become necessary. With project Jigsaw, classes can be truly encapsulated what makes them unavailable to some code even if those classes were declared to be public. However, Java programs that build on the assumption that all classes are always available at runtime might need to change fundamentally.
This change is most likely less fundamental for developers of end-user applications than for the maintainers of Java libraries and frameworks. A library is typically not aware of its user’s code during its compilation. For overcoming this limitation, a library can fallback to using reflection. This way, a container for dependency-injection (such as Spring) can instantiate bean instances of an application without the bean types being known to the framework at compile-time. For instantiating such objects, the container simply delays its work until runtime when it scans the application’s classpath and discovers the bean types which are now visible. For any of these types, the framework then locates a constructor which is invoked reflectively after resolving all injected dependencies.
Runtime discovery paired with reflection is used by a long list of Java frameworks. But in a modularized environment, running the previous runtime resolution is no longer possible without addressing module boundaries. With project Jigsaw, the Java runtime asserts that every module only accesses modules that are declared as a dependency in the accessing module’s descriptor. Additionally, the imported module must export the classes in question to its accessor. A modularized version of the dependency-injection container cannot declare any user module as a dependency and it is then forbidden reflective access. This would result in a runtime error when instantiating a non-imported class.
To overcome this limitation, project Jigsaw adds a new API that allows to include additional module dependencies at runtime. After making use of this API and adding all user modules, the modularized dependency-injection container can now continue to instantiate bean types that it does not know at compile-time.
But does this new API really solve the problem? From a purely functional point of view, this additional API allows for the migration of a library to retain its functionality even after being repackaged as a module. But unfortunately, the runtime enforcement of module boundaries creates a requirement for a ceremonial dance preceding the use of most reflection code. Before a method is invoked, the caller needs to always assure that the corresponding module is already a dependency of the caller. If a framework forgets to add this check, a runtime error is thrown without any chance of discovery during compilation.
With reflection being used excessively by many libraries and frameworks, it is unlikely that this change in accessibility is going to improve runtime encapsulation. Even if a security manager would restrict frameworks from adding runtime module dependencies, enforcing such boundaries would probably break most existing applications. More realistically, most breaches of module boundaries will not indicate true errors but be caused by improperly migrated code. At the same time, the runtime restriction is neither likely to improve encapsulation if most frameworks preemptively attain access to most user modules.
This requirement does of course not apply when a module uses reflection on its own types but such use of reflection is rather rare in practice and can be substituted by the use of polymorphism. In my eyes, enforcing module boundaries when using reflection contradicts its primary use case and makes the already non-trivial reflection API even more difficult to use.
Modularized resources
Beyond this limitation, it is currently unclear how the dependency-injection container would even discover the classes that it should instantiate. In a non-modularized application, a framework can for example expect a file of a given name to exist on the classpath. This file then serves as an entry point for describing how user code can be discovered. This file is typically obtained by requesting a named resource from a class loader. With project Jigsaw, this might no longer be possible when the required resource is also encapsulated within a module’s boundaries. As far as I know, the final state of resource encapsulation is not yet fully determined. When trying current early access builds, resources of foreign modules can however not be accessed.
Of course, this problem is also addressed in project Jigsaw’s current draft. To overcome module boundaries, Java’s preexisting ServiceLoader
class is granted super powers. For making specific classes available to other modules, a module descriptor provides a special syntax that enables leaking certain classes through module boundaries. Applying this syntax, a framework module declares that it provides a certain service. A user library then declares an implementation of the same service to be accessible to the framework. At runtime, the framework module looks up any implementation of its service using the service loader API. This can serve as way for discovering other modules at runtime and could substitute resource discovery.
While this solution seems elegant on a first glance, I remain sceptical of this proposal. The service loader API is fairly simple to use but at the same time, it is very limited in its capabilities. Furthermore, few people have adapted it for their own code which could be seen as an indicator for its limited scope. Unfortunately, only time can tell if this API accommodates all use cases in a sufficient manner. At the same time it is granted that a single Java class gets tied deeply into the Java runtime, making deprecation and substitution of the service loader API almost impossible. In the context of the history of Java which has already told many stories about ideas that seemed good but turned sour, I find it precarious to create such a magical hub that could easily turn out to be an implementation bottleneck.
Finally, it remains unclear how resources are exposed in modularized applications. While Jigsaw does not break any binary compatibility, returning null
from a call to ClassLoader::getResource
where a value was always returned previously might just bury applications under piles of null pointer exceptions. As an example, code manipulation tools require a means to locate class files which are now encapsulated what would at a minimum hinder their adoption process.
Optional dependencies
Another use case that the service loader API does not accommodate is the declaration of optional dependencies. In many cases, optional dependencies are not considered a good practice but in reality they offer a convenient way out if dependencies can be combined in a large number of permutations.
For example, a library might be able to provide better performance if a specific dependency is available. Otherwise, it would fall back to another, less optimal alternative. In order to use the optional dependency, the library is required to compile against its specific API. If this API is however not available at runtime, the library needs to assure that the optional code is never executed and fall back to the available default. Such an optional dependency cannot be expressed in a modularized environment where any declared module dependency is validated upon the application’s startup, even if the dependency was never used.
A special use-case for optional dependencies are optional annotation bundles. Today, the Java runtime treats annotations as optional metadata. This means that if an annotation’s type cannot be located by a class loader, the Java runtime simply ignores the annotation in question instead of throwing a NoClassDefFoundError
. For example, the FindBugs application offers an annotation bundle for suppressing potential bugs after a user found the code in question to be a false-positive. During an application’s regular runtime, the FindBugs-specific annotations are not required and are therefore not included in the application bundle. However, when running FindBugs, the utility explicitly adds the annotation package such that the annotations become visible. In project Jigsaw, this is no longer possible. The annotation type is only available if a module declares a dependency to the annotation bundle. If this dependency is later missing at runtime, an error is raised, despite the annotation’s irrelevance.
Non-modularization
Not bundling a framework as a module in Java 9 is of course the easiest way to avoid all of the discussed restrictions. The Java runtime considers any non-modularized jar-file to be part of a class loader’s so-called unnamed module. This unnamed module defines an implicit dependency on all other modules that exist within the running application and exports all of its packages to any other module. This serves as a fallback when mixing modularized and non-modularized code. Due to the implicit imports and exports of an unnamed module, all non-migrated code should continue to function as before.
While such an opt-out might be the best solution for a reflection-heavy framework, slow adoption of project Jigsaw does also defeat the purpose of a module system. With a lack of time being the major constraint of most open-source projects, this outcome is unfortunately quite probable. Furthermore, many open-source developers are bound to compiling their libraries to older versions of Java. Due to the different runtime behavior of modularized and non-modularized code, a framework would need to maintain two branches for being able to use Java 9 APIs to traverse module boundaries in the modularized bundle. It is unlikely that many open-source developers would make the time for such a hybrid solution.
Code instrumentation
In Java, reflective method access is not the only way of a library to interact with unknown user code. Using the instrumentation API, it is possible to redefine classes to include additional method calls. This is commonly used to for example implement method-level security or to collect code metrics.
When instrumenting code, the class file of a Java class is typically altered right before it is loaded by a class loader. Since a class transformation is typically applied immediately before class loading, it is currently impossible to preemptively alter the module graph as an unloaded class’s module is unknown. This might cause unresolvable conflicts that are impossible to resolve if the instrumenting code cannot access a loaded class before its first usage.
Summary
Software estimates are difficult and we all tend to underestimate the complexity of our applications. Project Jigsaw imposes a fundamental change to the runtime behavior of Java applications and it makes perfect sense to delay the release until every eventuality is thoroughly evaluated. Currently, there are too many open questions and it is a good choice to delay the release date.
I would prefer that module boundaries were not enforced by the runtime at all but remain a compiler construct. The Java platform already implements compile-time erasure of generic types and despite some imperfections and this solution has worked very well. Without runtime enforcement, modules would also be optional to adopt for dynamic languages on the JVM where the same form of modularization as in Java might not make sense. Finally, I feel that the current strict form of runtime encapsulation tries to solve a problem that does not exist. After working with Java for many years, I have rarely encountered situations where the unintentional usage of internal APIs has caused big problems. In contrast, I remember many occasions where abusing an API that was meant to be private has solved a problem that I could not have worked around. At the same time, other symptoms of lacking modules in Java, often referred to as jar hell, remain unsolved by Jigsaw which does not distinguish between different versions of a module.
Finally, I argue that backwards compatibility applies beyond the binary level. As a matter of fact, a binary incompatibility is usually easier to deal with than a behavioral change. In this context, Java has done a great job over the years. Therefore, method contracts should be respected just as highly as binary compatibility. While project Jigsaw does not technically break method contracts by providing unnamed modules, modularization makes subtle changes to code behavior that is based on its bundling. In my opinion, this will be confusing to both experienced Java developers and newcomers and result in reappearing runtime errors.
This is why I find the price for enforcing runtime module boundaries too high compared to the benefits that it offers. OSGi, a runtime module system with versioning capabilities already exists for those that really require modularization. As a big benefit, OSGi is implemented on top of the virtual machine and can therefore not influence VM behavior. Alternatively, I think that Jigsaw could include a canonical way for libraries to opt-out of runtime constraints where it makes sense such as for reflection-heavy libraries.
Reference: | Project Jigsaw: an incomplete puzzle from our JCG partner Rafael Winterhalter at the My daily Java blog. |