Spring Batch Restartability
First of all, I’d like to give a big thank you to the wonderful folks from Spring who have given countless hours of their time to ensure the viability of Spring Batch jobs, and the seemingly magical ability to issue restart on a job! Thank you for this elegant toolset that permit us to tear through massive datasets, while enabling us to dust ourselves off when we fall down!
While acknowledging that I still have much to learn, I’d like to share my well-learned lessons in the world of restartability. This post will include how to identify improper usage of Spring Batch’s Step & Job ExecutionContext as well as how to write good, wholesome components for Spring Batch.
Statefulness!
Statefulness is basically fancy talk for beans that have global variables that change.
As an example, take a one-dollar bill. It would be considered stateless, as its value is constant. On the other hand, take a stock like Google; its price fluctuates and its value would be considered variable (or stateful).
ExecutionContext
To maintain statefulness, Spring gives us access to the ExecutionContext for both Step & Job so that we may transact information that our job needs to stay on track and complete properly.
Anything stateful in your batch code base threatens the viability of its restartability. Processors, readers, writers, or anything that gets used by your batch operation, should be considered at risk when stateful.
What information could be maintained in the ExecutionContext?
Technically, I guess any serializable object could be placed into a ExecutionContext any time, but I’d say that it is a dangerous way to think. ExecutionContext updates should be handled in a very transactional way.
What information should be maintained in the ExecutionContext?
I would recommend only keeping primitive/pseudo-primitive simple values here. If you want to go to sleep easily at night, I would also recommend only writing these values through an ItemProcessor OR a method annotated with @BeforeStep or @AfterStep.
What should NOT happen?
The ExecutionContext should not be introduced and passed around in core business logic. Context values should not be updated in the middle of step execution. Additionally, you should avoid introducing a mutable value holder object into an ExecutionContext, as its reference can easily corrupt the values behind transactional boundaries.
When I see these types of examples in code, I consider them a threat to the application’s restartability and refuse to certify that application’s restartability.
In general, there is not a one-size-fits-all approach to ensuring that your job code is written in a way that guarantees that stateful information has been handled properly. However, I will tell you that you need to be thinking about:
- How transactions are completed (distributed, partitioned, multithreaded,etc.)
- How chunk progress is tracked?
- How your reads are sorted/grouped?
- What information will be needed at restart time?
Here is a general practice example for updating job relevant stateful information:
import org.springframework.batch.item.*; import org.springframework.context.annotation.Scope; import org.springframework.stereotype.Component; /*This could be a reader, or a writer, or maybe a processor... you need to identify when and where it is appropriate to perform these tracking activities. Remember to think about restarts! */ @Component @StepScope public class FooComponent implements ItemStream{ // a perfectly acceptable way to read a value from an ExecutionContext from anywhere! @Value(“#stepExecutionContext[‘fooStatefulCount’]”) long statefulCount = 0; // a read count perhaps? public static final String KEY = "bar"; public void open(ExecutionContext cxt) throws ItemStreamException { cxt.put(KEY, statefulCount); } public void update(ExecutionContext cxt) throws ItemStreamException { cxt.put(KEY, statefulCount); } public void close() throws ItemStreamException {} }
If you want a more comprehensive example, go look through the open method in AbstractItemCountingItemStreamItemReader!
Final Thoughts
My end advice is for other developers to strive to be fundamentally and completely ordinary when writing your code. Simplicity will lend understanding to the future, and subsequently, the business owners will cherish your gift of an application that is largely free of technical debt.
Reference: | Spring Batch Restartability from our JCG partner Ryan McCullough at the Keyhole Software blog. |