How does Hibernate Collection Cache work
Introduction
Previously, I described the second-level cache entry structure, Hibernate uses for storing entities. Besides entities, Hibernate can also store entity associations and this article will unravel the inner workings of collection caching.
Domain model
For the up-coming tests we are going to use the following entity model:
A Repository has a collection of Commit entities:
@org.hibernate.annotations.Cache( usage = CacheConcurrencyStrategy.READ_WRITE ) @OneToMany(mappedBy = "repository", cascade = CascadeType.ALL, orphanRemoval = true) private List<Commit> commits = new ArrayList<>();
Each Commit entity has a collection of Change embeddable elements.
@ElementCollection @CollectionTable( name="commit_change", joinColumns = @JoinColumn(name="commit_id") ) @org.hibernate.annotations.Cache( usage = CacheConcurrencyStrategy.READ_WRITE ) @OrderColumn(name = "index_id") private List<Change> changes = new ArrayList<>();
And we’ll now insert some test data:
doInTransaction(session -> { Repository repository = new Repository("Hibernate-Master-Class"); session.persist(repository); Commit commit1 = new Commit(); commit1.getChanges().add( new Change("README.txt", "0a1,5...") ); commit1.getChanges().add( new Change("web.xml", "17c17...") ); Commit commit2 = new Commit(); commit2.getChanges().add( new Change("README.txt", "0b2,5...") ); repository.addCommit(commit1); repository.addCommit(commit2); session.persist(commit1); });
Read-through caching
The Collection cache employs a read-through synchronization strategy:
doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for (Commit commit : repository.getCommits()) { assertFalse(commit.getChanges().isEmpty()); } });
and collections are cached upon being accessed for the first time:
select collection0_.id as id1_0_0_, collection0_.name as name2_0_0_ from Repository collection0_ where collection0_.id=1 select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id1_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.r select changes0_.commit_id as commit_i1_1_0_, changes0_.diff as diff2_2_0_, changes0_.path as path3_2_0_, changes0_.index_id as index_id4_0_ from commit_change changes0_ where changes0_.commit_id=1 select changes0_.commit_id as commit_i1_1_0_, changes0_.diff as diff2_2_0_, changes0_.path as path3_2_0_, changes0_.index_id as index_id4_0_ from commit_change changes0_ where changes0_.commit_id=2
After the Repository and its associated Commits get cached, loading the Repository and traversing the Commit and Change collections will not hit the database, since all entities and their associations are served from the second-level cache:
LOGGER.info("Load collections from cache"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(2, repository.getCommits().size()); });
There’s no SQL SELECT statement executed when running the previous test case:
CollectionCacheTest - Load collections from cache JdbcTransaction - committed JDBC Connection
Collection cache entry structure
For entity collections, Hibernate only stores the entity identifiers, therefore requiring that entities be cached as well:
key = {org.hibernate.cache.spi.CacheKey@3981} key = {java.lang.Long@3597} "1" type = {org.hibernate.type.LongType@3598} entityOrRoleName = {java.lang.String@3599} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Repository.commits" tenantId = null hashCode = 31 value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3982} value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3986} "CollectionCacheEntry[1,2]" version = null timestamp = 5858841154416640
The CollectionCacheEntry stores the Commit identifiers associated with a given Repository entity.
Because element types don’t have identifiers, Hibernate stores their dehydrated state instead. The Change embeddable is cached as follows:
key = {org.hibernate.cache.spi.CacheKey@3970} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Commit.changes#1" key = {java.lang.Long@3974} "1" type = {org.hibernate.type.LongType@3975} entityOrRoleName = {java.lang.String@3976} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Commit.changes" tenantId = null hashCode = 31 value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3971} value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3978} state = {java.io.Serializable[2]@3980} 0 = {java.lang.Object[2]@3981} 0 = {java.lang.String@3985} "0a1,5..." 1 = {java.lang.String@3986} "README.txt" 1 = {java.lang.Object[2]@3982} 0 = {java.lang.String@3983} "17c17..." 1 = {java.lang.String@3984} "web.xml" version = null timestamp = 5858843026345984
Collection Cache consistency model
Consistency is the biggest concern when employing caching, so we need to understand how the Hibernate Collection Cache handles entity state changes.
The CollectionUpdateAction is responsible for all Collection modifications and whenever the collection changes, the associated cache entry is evicted:
protected final void evict() throws CacheException { if ( persister.hasCache() ) { final CacheKey ck = session.generateCacheKey( key, persister.getKeyType(), persister.getRole() ); persister.getCacheAccessStrategy().remove( ck ); } }
This behavior is also documented by the CollectionRegionAccessStrategy specification:
For cached collection data, all modification actions actually just invalidate the entry(s).
Based on the current concurrency strategy, the Collection Cache entry is evicted:
- before the current transaction is committed, for CacheConcurrencyStrategy.NONSTRICT_READ_WRITE
- right after the current transaction is committed, for CacheConcurrencyStrategy.READ_WRITE
- exactly when the current transaction is committed, for CacheConcurrencyStrategy.TRANSACTIONAL
Adding new Collection entries
The following test case adds a new Commit entity to our Repository:
LOGGER.info("Adding invalidates Collection Cache"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(2, repository.getCommits().size()); Commit commit = new Commit(); commit.getChanges().add( new Change("Main.java", "0b3,17...") ); repository.addCommit(commit); }); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(3, repository.getCommits().size()); });
Running this test generates the following output:
--Adding invalidates Collection Cache insert into commit (id, repository_id, review) values (default, 1, false) insert into commit_change (commit_id, index_id, diff, path) values (3, 0, '0b3,17...', 'Main.java') --committed JDBC Connection select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id11_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.repository_id=1 --committed JDBC Connection
After a new Commit entity is persisted, the Repository.commits collection cache is cleared and the associated Commits entities are fetched from the database (the next time the collection is accessed).
Removing existing Collection entries
Removing a Collection element follows the same pattern:
LOGGER.info("Removing invalidates Collection Cache"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(2, repository.getCommits().size()); Commit removable = repository.getCommits().get(0); repository.removeCommit(removable); }); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(1, repository.getCommits().size()); });
The following output gets generated:
--Removing invalidates Collection Cache delete from commit_change where commit_id=1 delete from commit where id=1 --committed JDBC Connection select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id1_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.repository_id=1 --committed JDBC Connection
The Collection Cache is evicted once its structure gets changed.
Removing Collection elements directly
Hibernate can ensure cache consistency, as long as it’s aware of all changes the target cached collection undergoes. Hibernate uses its own Collection types (e.g. PersistentBag, PersistentSet) to allow lazy-loading or detect dirty state.
If an internal Collection element is deleted without updating the Collection state, Hibernate won’t be able to invalidate the currently cached Collection entry:
LOGGER.info("Removing Child causes inconsistencies"); doInTransaction(session -> { Commit commit = (Commit) session.get(Commit.class, 1L); session.delete(commit); }); try { doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); assertEquals(1, repository.getCommits().size()); }); } catch (ObjectNotFoundException e) { LOGGER.warn("Object not found", e); }
--Removing Child causes inconsistencies delete from commit_change where commit_id=1 delete from commit where id=1 -committed JDBC Connection select collection0_.id as id1_1_0_, collection0_.repository_id as reposito3_1_0_, collection0_.review as review2_1_0_ from commit collection0_ where collection0_.id=1 --No row with the given identifier exists: -- [CollectionCacheTest$Commit#1] --rolled JDBC Connection
When the Commit entity was deleted, Hibernate didn’t know it had to update all the associated Collection Caches. The next time we load the Commit collection, Hibernate will realize some entities don’t exist anymore and it will throw an exception.
Updating Collection elements using HQL
Hibernate can maintain cache consistency when executing bulk updates through HQL:
LOGGER.info("Updating Child entities using HQL"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for (Commit commit : repository.getCommits()) { assertFalse(commit.review); } }); doInTransaction(session -> { session.createQuery( "update Commit c " + "set c.review = true ") .executeUpdate(); }); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for(Commit commit : repository.getCommits()) { assertTrue(commit.review); } });
Running this test case generates the following SQL:
--Updating Child entities using HQL --committed JDBC Connection update commit set review=true --committed JDBC Connection select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id1_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.repository_id=1 --committed JDBC Connection
The first transaction doesn’t require hitting the database, only relying on the second-level cache. The HQL UPDATE clears the Collection Cache, so Hibernate will have to reload it from the database when the collection is accessed afterwards.
Updating Collection elements using SQL
Hibernate can also invalidate cache entries for bulk SQL UPDATE statements:
LOGGER.info("Updating Child entities using SQL"); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for (Commit commit : repository.getCommits()) { assertFalse(commit.review); } }); doInTransaction(session -> { session.createSQLQuery( "update Commit c " + "set c.review = true ") .addSynchronizedEntityClass(Commit.class) .executeUpdate(); }); doInTransaction(session -> { Repository repository = (Repository) session.get(Repository.class, 1L); for(Commit commit : repository.getCommits()) { assertTrue(commit.review); } });
Generating the following output:
--Updating Child entities using SQL --committed JDBC Connection update commit set review=true --committed JDBC Connection select commits0_.repository_id as reposito3_0_0_, commits0_.id as id1_1_0_, commits0_.id as id1_1_1_, commits0_.repository_id as reposito3_1_1_, commits0_.review as review2_1_1_ from commit commits0_ where commits0_.repository_id=1 --committed JDBC Connection
The BulkOperationCleanupAction is responsible for cleaning up the second-level cache on bulk DML statements. While Hibernate can detect the affected cache regions when executing an HQL statement, for native queries you need to instruct Hibernate what regions the statement should invalidate. If you don’t specify any such region, Hibernate will clear all second-level cache regions.
Conclusion
The Collection Cache is a very useful feature, complementing the second-level entity cache. This way we can store an entire entity graph, reducing the database querying workload in read-mostly applications. Like with AUTO flushing, Hibernate cannot introspect the affected table spaces when executing native queries. To avoid consistency issues (when using AUTO flushing) or cache misses (second-level cache), whenever we need to run a native query we have to explicitly declare the targeted tables, so Hibernate can take the appropriate actions (e.g. flushing or invalidating cache regions).
- Code available on GitHub.
Reference: | How does Hibernate Collection Cache work from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog. |