Enterprise Java

Implementing correlation ids in Spring Boot (for distributed tracing in SOA/microservices)

After attending Sam Newman’s microservice talks at Geecon last week I started to think more about what is most likely an essential feature of service-oriented / microservice platforms for monitoring, reporting and diagnostics: correlation ids. Correlation ids allow distributed tracing within complex service oriented platforms, where a single request into the application can often be dealt with by multiple downstream service. Without the ability to correlate downstream service requests it can be very difficult to understand how requests are being handled within your platform.

I’ve seen the benefit of correlation ids in several recent SOA projects I have worked on, but as Sam mentioned in his talks, it’s often very easy to think this type of tracing won’t be needed when building the initial version of the application, but then  very difficult to retrofit into the application when you do realise the benefits (and the need for!). I’ve not yet found the perfect way to implement correlation ids within a Java/Spring-based application, but after chatting to Sam via email he made several suggestions which I have now turned into a simple project using Spring Boot to demonstrate how this could be implemented.

Why?

During both of Sam’s Geecon talks he mentioned that in his experience correlation ids were very useful for diagnostic purposes. Correlation ids are essentially an id that is generated and associated with a single (typically user-driven) request into the application that is passed down through the stack and onto dependent services. In SOA or microservice platforms this type of id is very useful, as requests into the application typically are ‘fanned out’ or handled by multiple downstream services, and a correlation id allows all of the downstream requests (from the initial point of request) to be correlated or grouped based on the id. So called ‘distributed tracing’ can then be performed using the correlation ids by combining all the downstream service logs and matching the required id to see the trace of the request throughout your entire application stack (which is very easy if you are using a centralised logging framework such as logstash).

The big players in the service-oriented field have been talking about the need for distributed tracing and correlating requests for quite some time, and as such Twitter have created their open source Zipkin framework (which often plugs into their RPC framework Finagle), and Netflix has open-sourced their Karyon web/microservice framework, both of which provide distributed tracing. There are of course commercial offering in this area, one such product being AppDynamics, which is very cool, but has a rather hefty price tag.

Creating a proof-of-concept in Spring Boot

As great as Zipkin and Karyon are, they are both relatively invasive, in that you have to build your services on top of the (often opinionated) frameworks. This might be fine for some use cases, but no so much for others, especially when you are building microservices. I’ve been enjoying experimenting with Spring Boot of late, and this framework builds on the much known and loved (at least by me!) Spring framework by providing lots of preconfigured sensible defaults. This allows you to build microservices (especially ones that communicate via RESTful interfaces) very rapidly. The remainder of this blog pos explains how I implemented a (hopefully) non-invasive way of implementing correlation ids.

Goals

  1. Allow a correlation id to be generated for a initial request into the application
  2. Enable the correlation id to be passed to downstream services, using as method that is as non-invasive into the code as possible

Implementation

I have created two projects on GitHub, one containing an implementation where all requests are being handled in a synchronous style (i.e. the traditional Spring approach of handling all request processing on a single thread), and also one for when an asynchronous (non-blocking) style of communication is being used (i.e., using the Servlet 3 asynchronous support combined with Spring’s DeferredResult and Java’s Futures/Callables). The majority of this article describes the asynchronous implementation, as this is more interesting:

The main work in both code bases is undertaken by the CorrelationHeaderFilter, which is a standard Java EE Filter that inspects the HttpServletRequest header for the presence of a correlationId. If one is found then we set a ThreadLocal variable in the RequestCorrelation Class (discussed later). If a correlation id is not found then one is generated and added to the RequestCorrelation Class:

public class CorrelationHeaderFilter implements Filter {

    //...

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain)
            throws IOException, ServletException {

        final HttpServletRequest httpServletRequest = (HttpServletRequest) servletRequest;
        String currentCorrId = httpServletRequest.getHeader(RequestCorrelation.CORRELATION_ID_HEADER);

        if (!currentRequestIsAsyncDispatcher(httpServletRequest)) {
            if (currentCorrId == null) {
                currentCorrId = UUID.randomUUID().toString();
                LOGGER.info("No correlationId found in Header. Generated : " + currentCorrId);
            } else {
                LOGGER.info("Found correlationId in Header : " + currentCorrId);
            }

            RequestCorrelation.setId(currentCorrId);
        }

        filterChain.doFilter(httpServletRequest, servletResponse);
    }


    //...

    private boolean currentRequestIsAsyncDispatcher(HttpServletRequest httpServletRequest) {
        return httpServletRequest.getDispatcherType().equals(DispatcherType.ASYNC);
    }

The only thing in this code that may not instantly be obvious is the conditional check currentRequestIsAsyncDispatcher (httpServletRequest), but this is here to guard against the correlation id code being executed when the Async Dispatcher thread is running to return the results (this is interesting to note, as I initially didn’t expect the Async Dispatcher to trigger the execution of the filter again!).

Here is the RequestCorrelation Class, which contains a simple ThreadLocal<String> static variable to hold the correlation id for the current Thread of execution (set via the CorrelationHeaderFilter above):

public class RequestCorrelation {

    public static final String CORRELATION_ID = "correlationId";

    private static final ThreadLocal<String> id = new ThreadLocal<String>();


    public static String getId() { return id.get(); }

    public static void setId(String correlationId) { id.set(correlationId); }
}

Once the correlation id is stored in the RequestCorrelation Class it can be retrieved and added to downstream service requests (or data store access etc) as required by calling the static getId() method within RequestCorrelation. It is probably a good idea to encapsulate this behaviour away from your application services, and you can see an example of how to do this in a RestClient Class I have created, which composes Spring’s RestTemplate and handles the setting of the  correlation id within the header transparently from the calling Class.

@Component
public class CorrelatingRestClient implements RestClient {

    private RestTemplate restTemplate = new RestTemplate();

    @Override
    public String getForString(String uri) {
        String correlationId = RequestCorrelation.getId();
        HttpHeaders httpHeaders = new HttpHeaders();
        httpHeaders.set(RequestCorrelation.CORRELATION_ID, correlationId);

        LOGGER.info("start REST request to {} with correlationId {}", uri, correlationId);

        //TODO: error-handling and fault-tolerance in production
        ResponseEntity<String> response = restTemplate.exchange(uri, HttpMethod.GET,
                new HttpEntity<String>(httpHeaders), String.class);

        LOGGER.info("completed REST request to {} with correlationId {}", uri, correlationId);

        return response.getBody();
    }
}


//... calling Class
public String exampleMethod() {
        RestClient restClient = new CorrelatingRestClient();
        return restClient.getForString(URI_LOCATION); //correlation id handling completely abstracted to RestClient impl
}

Making this work for asynchronous requests…

The code included above works fine when you are handling all of your requests synchronously, but it is often a good idea in a SOA/microservice platform to handle requests in a non-blocking asynchronous manner. In Spring this can be achieved by using the DeferredResult Class in combination with the Servlet 3 asynchronous support. The problem with using ThreadLocal variables within the asynchronous approach is that the Thread that initially handles the request (and creates the DeferredResult/Future) will not be the Thread doing the actual processing.

Accordingly, a bit of glue code is needed to ensure that the correlation id is propagated across the Threads. This can be achieved by extending Callable with the required functionality: (don’t worry if example Calling Class code doesn’t look intuitive – this adaption between DeferredResults and Futures is a necessary evil within Spring, and the full code including the boilerplate ListenableFutureAdapter is in my GitHub repo):

public class CorrelationCallable<V> implements Callable<V> {

    private String correlationId;
    private Callable<V> callable;

    public CorrelationCallable(Callable<V> targetCallable) {
        correlationId = RequestCorrelation.getId();
        callable = targetCallable;
    }

    @Override
    public V call() throws Exception {
        RequestCorrelation.setId(correlationId);
        return callable.call();
    }
}

//... Calling Class

@RequestMapping("externalNews")
public DeferredResult<String> externalNews() {
    return new ListenableFutureAdapter<>(service.submit(new CorrelationCallable<>(externalNewsService::getNews)));
}

And there we have it – the propagation of correlation id regardless of the synchronous/asynchronous nature of processing!

You can clone the Github report containing my asynchronous example, and execute the application by running mvn spring-boot:run at the command line. If you access http://localhost:8080/externalNews in your browser (or via curl) you will see something similar to the following in your Spring Boot console, which clearly demonstrates a correlation id being generated on the initial request, and then this being propagated through to a simulated external call (have a look in the ExternalNewsServiceRest Class to see how this has been implemented):

[nio-8080-exec-1] u.c.t.e.c.w.f.CorrelationHeaderFilter    : No correlationId found in Header. Generated : d205991b-c613-4acd-97b8-97112b2b2ad0
[pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient      : start REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0
[nio-8080-exec-2] u.c.t.e.c.w.f.CorrelationHeaderFilter    : Found correlationId in Header : d205991b-c613-4acd-97b8-97112b2b2ad0
[pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient      : completed REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0

Conclusion

I’m quite happy with this simple prototype, and it does meet the two goals I listed above. Future work will include writing some tests for this code (shame on me for not TDDing!), and also extend this functionality to a more realistic example.

I would like to say a massive thanks to Sam, not only for sharing his knowledge at the great talks at Geecon, but also for taking time to respond to my emails. If you’re interested in microservices and related work I can highly recommend Sam’s Microservice book which is available in Early Access at O’Reilly. I’ve enjoyed reading the currently available chapters, and having implemented quite a few SOA projects recently I can relate to a lot of the good advice contained within. I’ll be following the development of this book with keen interest!

Resources

I used Tomasz Nurkiewicz’s excellent blog several times for learning how best to wire up all of the DeferredResult/Future code in Spring:

http://www.nurkiewicz.com/2013/03/deferredresult-asynchronous-processing.html

Daniel Bryant

Daniel Bryant spends his day working as a software development consultant, specialising in designing and deploying JVM and NoSQL-based business solutions to the cloud for Small-to-Medium Enterprises. By night he works on several open-source projects, primarily with the goal to contribute back to the community, but also to learn about exciting new technologies. Daniel is an active member of the London Java Community and as part of this role he is currently working on the 'Adopt OpenJDK' Betterrev project. This project strives to make contributions to the OpenJDK source code much simpler, with the ultimate goal of promoting community engagement in the Reference Implementation for Java. During any remaining spare time Daniel also keeps an eye on academic developments within Artificial Intelligence and defeasible reasoning, having gained a PhD in this topic during his misspent youth as an academic.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Bart Prokop
Bart Prokop
10 years ago

Hi,

I use correlation ID when possible for years ;) However what I noticed during the Sam’s (and others) talks on this year Geecon was that everyone is using RESTFul and HTTP for interservice communication. I only want to remind that correlationId is the concept present in messaging technology for over 25 years or more. It also makes much more sense to me to use messaging middleware (amqp for example) as robust communication method between microservices (despite I like REST). It look a bit like we rediscover old concepts that were always around…

BTW. Good article !

Daniel Bryant
Daniel Bryant
10 years ago
Reply to  Bart Prokop

Hi Bart,

Yeah, this is definitely a valid point. In my career I have found that trends do repeat in a cyclic fashion (functional programming anyone?), and therefore I believe it is important to constantly expose people new to the industry to the fundamentals of computing, and also distributed computing for the cloud etc.

Thanks for the comments!

David Artiga
David Artiga
10 years ago

You should clean the threadlocal in a finally block in the filter, or suffer the OutOfMemory curse!

Daniel Bryant
Daniel Bryant
10 years ago
Reply to  David Artiga

Hi David,

Many thanks for the comment – I’ll try and add this into my git repo soon. I think this would only be an issue with multiple restarts of the app container (and associated Classloader), but I’m not 100% sure?

David Artiga
David Artiga
10 years ago
Reply to  Daniel Bryant

Hi Daniel,

Yes, that’s the main reason.

On the other hand, I’ve seen weird behaviors on some app containers while handling web thread pools (one specially, commercial), so in order to keep it safe, I always clean the mess after working with a ThreadLocal :P

Nice artlce!
/David

Back to top button