State Does Not Belong In The Code
Your application needs to be scalable. This means it needs to be run in a cluster, and state is the hardest thing to distribute. If you minimize the places where state is stored, you minimize the complexity of clustering. But state should exist, and here is where it is fine to have it:
- the database – be it SQL, NoSQL or even a search engine, it’s the main thing that stores state. It is the thing that is supposed to support clustering, or a huge dedicated machine that handles requests from multiple other “code” servers. The code communicates with the database, but the code itself does not store anything for more than one client request;
- cache – caching is relatively easy to distribute (it’s basically key-value). There are many ready-to-use solutions like EhCache and memcached. So instead of computing a result, or getting it from the DB on each request, you can configure caching and store the result in memory. But again – code does not store anything – it just populates and queries the cache;
- HTTP session – in web components (controllers, managed beans, whatever you call it). It is very similar to caching, though it has a different purpose – to allow identifying subsequent actions by the same user (http itself is stateless). But as your code runs on multiple machines, the load-balancer may not always send subsequent requests to the same server. So the session should also be replicated across all servers. Fortunately, most containers have that option built-in, so you just add one configuration line. Alternatively you can instruct the load-balancer to use a “sticky session” (identify which server to send the request depending on the session cookie), but it moves some state management to the load-balancer as well. Regardless of the option you choose, do not put too much data in the session
- the file system – when you store files, you need them to be accessible to all machines. There are multiple options here, including SAN or using a cloud storage service like Amazon S3, which are accessible through an API
All these are managed outside the code. Your code simply consumes them through an API (the Session API, the cache API, JDBC, S3/file system API). If the code contained any of that state (as instance-variables of your objects) the application would be hard to support (you’d have to manage state yourself) and will be less scalable. Of course, there are some rare cases, where you can’t go without storing state in the code. Document these and make sure they do not rely on working in a cluster.
But what can go wrong if you store state in the objects that perform the business logic? You have two options then:
- synchronize access to fields – this will kill performance, because all users that make requests will have to wait in queue for the service to manage its fields;
- make new instance of your class for each HTTP request, and manage the instances somehow. Managing these instances is the hard part. People may be inclined to choose the session to do it, which means the session grows very large and gets harder to replicate (sharing a lot of data across multiple machines is slower, and session replication must be fast). Not to mention the unnecessarily increased memory footprint.
Here’s a trivial example of what not to do. You should pass these kinds of values as method arguments, rather than storing them in the instance:
class OrderService { double orderPrice; void processOrder(OrderDto order) { for (Entry entry : order.getEntries() { orderPrice += entry.getPrice(); } boolean discounts = hasDiscounts(order); } boolean hasDiscounts(OrderDto order) { return order.getEntries().length > 5 && orderPrice > 200; } }
So, make all your code stateless – this will ensure at least some level of scalability.
Reference: State Does Not Belong In The Code from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog.