Take control of your slow producers with READ-BEHIND CACHE
In our connected world, we often use data from APIs that we don’t own or don’t have access to improve. If all things go right, their performance is good and everybody’s happy. But too many times we have to use APIs that have less than optimal latency.
Of course, the answer is to cache that data. But, a cache that you don’t know when it gets stale is a dangerous thing, so that’s not a proper solution.
Therefore… we’re stuck. We’ll need to get used to waiting for our page to load or invest in a really nice spinner to entertain the users while they wait for the data. Or… are we? What if, for a small, calculated compromise we could have our desired performance using the same slow producer.
I think everybody heard about write-behind cache. It’s an implementation of a cache that registers a write that will happen asynchronously, the caller is free to continue its business while the write is performed on a background task.
What if we adopt this idea for the read side of the issue. Let’s have a read-behind cache for our slow producers.
Fair warning: This technique applies only to data that we can afford to be stale for a limited number of requests. So if you can accept that your data will be “eventually fresh“, you can apply this.
I’ll use Spring Boot to build my application. All the code presented can be accessed on GitHub: https://github.com/bulzanstefan/read-behind-presentation. There are 3 branches for different stages of the implementation.
The code samples contain only the relevant lines for brevity.
Status QUO
branch: status-quo
So, we’ll start with the status quo. Firstly, we have a slow producer that receives a URL parameter. To simplify this, our producer will sleep for 5 seconds and then return a timestamp (of course this is not a good example of low-change data, but for our purposes, it’s useful to detect that the data is fresh as soon as possible).
1 2 3 4 5 6 7 | public static final SimpleDateFormat SIMPLE_DATE_FORMAT = new SimpleDateFormat( "HH:mm:ss.SSS" ); @GetMapping String produce(@RequestParam String name) throws InterruptedException { Thread. sleep (5000); return name + " : " + SIMPLE_DATE_FORMAT. format (new Date()); } |
In the consumer we just make a call to the producer:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 | //ConsumerController .java @GetMapping public String consume(@RequestParam(required = false ) String name) { return producerClient.performRequest(ofNullable(name).orElse( "default" )); } //ProducerClient .java @Component class ProducerClient { public String performRequest(String name) { return new RestTemplate().getForEntity( String.class, name) .getBody(); } } |
SIMPLE CACHE
branch: simple-cache
To enable simple cache in spring we need to add the following
- dependency to
org.springframework.boot:spring-boot-starter-cache
- enable cache in application.properties :
spring.cache.type=simple
- add
@EnableCaching
annotation to your Spring Application main class - add
@Cacheable("cacheName")
to your method to cache
Now we have a simple cache represented. This will work with distributed cache also, but for this example, we’ll stick with in-memory one. The consumer will cache the data, and after the first call, the latency is gone. But the data will become stale fast, and nobody evicts it. We can do better!
INTERCEPT THE CALL
branch: master
The next thing we need to do is to intercept the call when it happens, regardless if it is cached or not.
In order to do this we need to
- create a custom annotation:
@ReadBehind
- register an aspect that will intercept the method call annotated with
@ReadBehind
So, we create the annotation and add it to performRequest
method
1 2 3 | @ReadBehind @Cacheable(value = CACHE_NAME, keyGenerator = "myKeyGenerator" ) public String performRequest(String name) { |
As you see, a CACHE_NAME constant was defined. If you need to dynamically set the cache name you can use a CacheResolver and a configuration. Also, in order to control the key structure, we need to define a key generator.
1 2 3 4 5 6 | @Bean KeyGenerator myKeyGenerator() { return (target, method, params) -> Stream.of(params) .map(String::valueOf) .collect(joining( "-" )); } |
Furthermore, in order to add the aspect, we need to
- add the dependency to
org.springframework.boot:spring-boot-starter-aop
- create the aspect class
- we need to implement the Ordered interface and return 1 for getOrder method. This is needed for the aspect to kick in even if the cache mechanism will suppress the call of the method when the value is already in the cache
01 02 03 04 05 06 07 08 09 10 | @Aspect @Component public class ReadBehindAdvice implements Ordered { @Before( "@annotation(ReadBehind)" ) public Object cacheInvocation(JoinPoint joinPoint) { ... @Override public int getOrder() { return 1; } |
Now we have a way to intercept all the calls to the @ReadBehind method.
REMEMBER THE CALL
Now that we have the call, we need to save all the needed data to be able to call it from another thread.
For this we need to retain:
- the bean that was called
- arguments called
- method name
1 2 3 4 5 | @Before( "@annotation(ReadBehind)" ) public Object cacheInvocation(JoinPoint joinPoint) { invocations.addInvocation(new CachedInvocation(joinPoint)); return null; } |
1 2 3 4 5 | public CachedInvocation(JoinPoint joinPoint) { targetBean = joinPoint.getTarget(); arguments = joinPoint.getArgs(); targetMethodName = joinPoint.getSignature().getName(); } |
We’ll keep these objects in a different bean
1 2 3 4 5 6 7 8 | @Component public class CachedInvocations { private final Set<CachedInvocation> invocations = synchronizedSet(new HashSet<>()); public void addInvocation(CachedInvocation invocation) { invocations.add(invocation); } } |
The fact that we’re keeping the invocations in a set and we have a scheduled job that processes those invocations at fixed rate will give us also a nice side-effect of throttling the calls to the external API.
SCHEDULE THE REad-behind job
Now that we know what calls were performed we can start a scheduled job to take those calls and refresh the data in the cache
In order to schedule a job in Spring Framework, we need to
- add annotation
@EnableScheduling
to your spring application class - create a job class with a method annotated with
@Scheduled
01 02 03 04 05 06 07 08 09 10 11 | @Component @RequiredArgsConstructor public class ReadBehindJob { private final CachedInvocations invocations; @Scheduled(fixedDelay = 10000) public void job() { invocations.nextInvocations() .forEach(this::refreshInvocation); } } |
REFRESH THE CACHE
Now that we have all the information collected, we can make the real call on the read-behind thread and update the information in the cache.
Firstly, we need to call the real method:
01 02 03 04 05 06 07 08 09 10 11 12 13 | private Object execute(CachedInvocation invocation) { final MethodInvoker invoker = new MethodInvoker(); invoker.setTargetObject(invocation.getTargetBean()); invoker.setArguments(invocation.getArguments()); invoker.setTargetMethod(invocation.getTargetMethodName()); try { invoker.prepare(); return invoker.invoke(); } catch (Exception e) { log.error( "Error when trying to reload the cache entries " , e); return null; } } |
Now that we have the fresh data, we need to update the cache
Firstly, calculate the cache key. For this, we need to use the key generator defined for the cache.
Now that we have all information to update the cache, let’s take the cache reference and update the value
01 02 03 04 05 06 07 08 09 10 11 12 | private final CacheManager cacheManager; ... private void refreshForInvocation(CachedInvocation invocation) { var result = execute(invocation); if (result != null) { var cacheKey = keyGenerator.generate(invocation.getTargetBean(), invocation.getTargetMethod(), invocation.getArguments()); var cache = cacheManager.getCache(CACHE_NAME); cache.put(cacheKey, result); } } |
And with this, we finished the implementation of our read-behind idea. Of course, there are other concerns still remaining that you need to address.
For example, you could do this implementation and trigger the call on a thread immediately. That will ensure the cache refresh at the first possible time. If the staleness time is a major concern for you, you should do it.
I like the scheduler because it acts also as a throttling mechanism. Therefore, if you make the same call over and over again, the read-behind scheduler will collapse those calls in a single call
RUNNING THE SAMPLE CODE
- Prerequisites: have java 11+ installed
- Download or clone the code https://github.com/bulzanstefan/read-behind-presentation
- build the producer:
mvnw package or mvnw.bat package
- run the producer :
java -jar target\producer.jar
- build the consumer :
mvnw package or mvnw.bat package
- run the consumer :
java -jar target\consumer.jar
- access the producer: http://localhost:8888/producer?name=test
- access the consumer: http://localhost:8080/consumer?name=abc
- the consumer will return updated values after ~15 seconds(10 sec. scheduler, 5 – new request), but no latency should be visible after the first call.
WARNING
As I said at the beginning of this article there are some things you should be aware of when implementing read-behind.
Also, if you can’t afford eventual consistency, don’t do it
This is suitable for high-frequency reads with low-frequency changes APIs
If the API has a sort of ACL implemented, you need to add the username with which you make the request in the cache key. Otherwise, very bad things can happen.
Therefore, analyze your application carefully and use this idea only where appropriate
Published on Java Code Geeks with permission by Stefan Bulzan, partner at our JCG program. See the original article here: Take control of your slow producers with READ-BEHIND CACHE Opinions expressed by Java Code Geeks contributors are their own. |