Natural Ids in Hibernate
A natural id is a property or a set of properties that would identify an entity uniquely. We can have at-most one natural id defined for an entity. When Hibernate sees natural-id tag in an entity mapping file, it automatically creates unique and not-null constraints on the properties constituting natural-id. First, let us look at examples of simple and composite natural ids.
Simple Natural Id: A person can be uniquely identified by his Voter Id. So we can say that it can be from his natural-id.
<!-- Version 1 --> <hibernate-mapping package="com.pramati.model"> <class name="Person" table="PERSON"> <id name="id" column="ID"> <generator class="native"/> </id> <natural-id> <property name="voterId" type="string" column="VOTER_ID"/> </natural-id> <property name="name" type="string" column="NAME"/> <!-- Other properties --> </class> </hibernate-mapping>
Composite Natural Id: Phone number, i.e. combination of std code and land-line number, can form natural id for person entity.
<!-- Version 2 --> <hibernate-mapping package="com.pramati.model"> <class name="Person" table="PERSON"> <id name="id" column="ID"> <generator class="native"/> </id> <natural-id> <property name="stdCode" type="string" column="STD_CODE"/> <property name="landlineNumber" type="string" column="LANDLINE_NUMBER"/> </natural-id> <property name="name" type="string" column="NAME"/> <!-- Other properties --> </class> </hibernate-mapping>
So Hibernate creates a not-null constraint on stdCode and landlineNumber. These properties together should be unique for a person entity.
Natural ids are immutable by default. So suppose if you try to load a person entity from database and change any of the properties which constitutes natural-id, Hibernate would throw an exception. For example, we have loaded Person and tried to modify his landlineNumber/stdcode in an acive session, here is the exception we would get:
org.hibernate.HibernateException:: An immutable natural identifier of entity com.pramati.model.Person was altered from abc to xyz
Hibernate 4.1 came up with a feature of loading entities by natural-id of a bean. Till now the session-cache caches objects loaded through get/load within the current session. Now objects loaded using natural-id are also cached by default. Here are the recent additions to Session API:
public NaturalIdLoadAccess byNaturalId(String entityName); public NaturalIdLoadAccess byNaturalId(Class entityClass); public SimpleNaturalIdLoadAccess bySimpleNaturalId(String entityName); public SimpleNaturalIdLoadAccess bySimpleNaturalId(Class entityClass);
We can load instances of a class by natural id as follows:
// In case of version 1 defined above: Person person = (Person)session.byNaturalId(Person.class ) .using( "voterID", "ZAAXDFT435" ) .load(); // For Version 1, this can be simplified as: Person person = (Person)session.bySimpleNaturalId(Person.class ) .load("ZAAXDFT435"); // In case of version 2 defined above: Person person = (Person)session.byNaturalId(Person.class ) .using("stdCode", "040") .using("landlineNumber","2345678") .load();
Note that the entity returned by load is not just a proxy but the actual entity itself. If we want to get a proxy, then instead of load() we have to use getReference() as follows:
session.byNaturalId(Person.class ) .using("stdCode", "040") .using("landlineNumber","2345678") .getReference();
For consistency, the new approach has been made available for identifier based loading as well.
public IdentifierLoadAccess byId(String entityName); public IdentifierLoadAccess byId(Class entityClass);
So instead of session.load(Person.class, id) we can use session.byId(Person.class).getReference(id). And instead of session.get(Person.class, id) we can use session.byId(Person.class).load(id)
Natural Ids are also beneficial when we make use of query cache. Query cache is often not so useful as it gets invalidated very often. Suppose we had sequence of events as follows:
Scenario-1:
1. HQL query for loading person A using the properties in entity natural-id. The query is also cached i.e. query.setCacheable(true)
2. Another person B is inserted into Person table.
3. Now load A again using the same query that we have used in step 1.
The question is: In step 3, will a new database call be done to fetch A from Person table. Yes or No?
The answer is Yes. What happens is Hibernate internally maintains a time stamp cache. This time stamp cache records the time at which a particular Hibernate managed table got modified. Now at step(3), Hibernate sees that it is a cached query. But before returning the entity existing in cache, it makes a validation of whether the results cached are older with respect to the table modification time. Now as the table got modified after the caching, Hibernate makes a new query again.
To understand this more, let us consider the following scenario: Let us we have only record in Person table with name Rama
Scenario-2:
a. Execute cached query to get list of persons with name matching ‘Rama’: “from Person where person name=’Rama’”
b. Insert a record into Person whose name is ‘Rama’ too. This is not a problem as name is not defined as unique property
c. Now execute the query in step(a) again.
Initially at step(a), we get only record. But at step(c), Hibernates hit DB again even though the result is cached. This happens due to time stamp cache invalidation. Hibernate just check if the table has been modified or not before returning the entity from cache. But it does not bother how the table has got updated whether an update or insert or so-on.
But in the former scenario we looked at, this validation check seems completely irrelevant as the record inserted had got nothing to do with the entity loaded. This check can be bypassed if we use natural-ids for fetching the entity. When natural-id is used, it is guaranteed that the result would not change even after database modifications. Earlier when we don’t have support for loading entities using natural-ids, we have the provision of using natural-ids in Criteria API. We can use the following at steps(1) and (3) of Scenario-1
session.createCriteria(Person.class).add( Restrictions.naturalId().set("stdCode", person.getStdCode()). set("landlineNumber", person.getLandlineNumber())). setCacheable(true). uniqueResult();
When natural ids are used for fetching the entity, time stamp cache check would be bypassed. So now if I replace steps (1) and (3) of first scenario with this criteria instead of query, database would be hit only once. Instead of Restrictions.naturalId had we used Restrictions.eq, database would have been hit twice. Also if you are using latest versions of Hibernate, we can use the new API instead of building criteria.