Enterprise Java

Efficiently delete data with JPA and Hibernate

You may come to the situation where you have to perform a bulk deletion on a huge amount of datasets stored in a relational database. If you use JPA with Hibernate as underlying OR mapper, you might try to call the remove() method of the EntityManager in a way like the following:
 
 
 
 
 
 
 

public void removeById(long id) {
    RootEntity rootEntity = entityManager.getReference(RootEntity.class, id);
    entityManager.remove(rootEntity);
}

First of all, we load a reference representation of the entity we want to delete and then pass this reference to the EntityManager. Let’s assume the RootEntity from above has a child relation to a class called ChildEntity:

@OneToMany(mappedBy = "rootEntity", fetch = FetchType.EAGER, cascade = CascadeType.ALL)
private Set childEntities = new HashSet(0);

If we now turn on the property show_sql of hibernate, we will wonder what SQL statements are issued:

select
        rootentity0_.id as id5_1_,
        rootentity0_.field1 as field2_5_1_,
        rootentity0_.field2 as field3_5_1_,
        childentit1_.PARENT as PARENT5_3_,
        childentit1_.id as id3_,
        childentit1_.id as id4_0_,
        childentit1_.field1 as field2_4_0_,
        childentit1_.field2 as field3_4_0_,
        childentit1_.PARENT as PARENT4_0_
    from
        ROOT_ENTITY rootentity0_
    left outer join
        CHILD_ENTITY childentit1_
            on rootentity0_.id=childentit1_.PARENT
    where
        rootentity0_.id=?

    delete
    from
        CHILD_ENTITY
    where
        id=?

   delete
   from
       ROOT_ENTITY
   where
       id=?

Why does Hibernate first load all data into memory in order to delete this data immediately afterwards? The reason is that JPA’s lifecycle requires that the object is in “managed” state, before it can be deleted. Only in this state all lifecycle functionality like interceptors is available (see here). Therefore Hibernate issues a SELECT query before the deletion in order to transfer both RootEntity and ChildEntity to the “managed” state. But what can we do, if we just want to delete RootEntity and ChildEntity, if we know the id of RootEntity? The answer is to use a simple DELETE query like the following one. But due to the integrity constraint on the child table, we first have to delete all depending child entities. The following code demonstrates how:

List childIds = entityManager.createQuery("select c.id from ChildEntity c where c.rootEntity.id = :pid").setParameter("pid", id).getResultList();
for(Long childId : childIds) {
    entityManager.createQuery("delete from ChildEntity c where c.id = :id").setParameter("id", childId).executeUpdate();
}
entityManager.createQuery("delete from RootEntity r where r.id = :id").setParameter("id", id).executeUpdate();

The above code results in the three SQL statements we would have expected by calling remove(). Now you may argue, that this way of deletion is more complicated than just calling the EntityManager’s remove() method. It also ignores annotations like @OneToMany and @ManyToOne we have placed in the two entity classes. So why not write some code that uses the knowledge about the two entities that already exists in the two class files? First of all, we look for @OneToMany annotations using reflection in the RootEntity class, extract the type of the child entity and then look for its back relation field annotated with @ManyToOne. Having done this, we can easily write the three SQL statements in a more generic way:

public void delete(EntityManager entityManager, Class parentClass, Object parentId) {
    Field idField = getIdField(parentClass);
    if (idField != null) {
        List oneToManyFields = getOneToManyFields(parentClass);
        for (Field field : oneToManyFields) {
            Class childClass = getFirstActualTypeArgument(field);
            if (childClass != null) {
                Field manyToOneField = getManyToOneField(childClass, parentClass);
                Field childClassIdField = getIdField(childClass);
                if (manyToOneField != null && childClassIdField != null) {
                    List childIds = entityManager.createQuery(String.format("select c.%s from %s c where c.%s.%s = :pid", childClassIdField.getName(), childClass.getSimpleName(), manyToOneField.getName(), idField.getName())).setParameter("pid", parentId).getResultList();
                    for (Long childId : childIds) {
                        entityManager.createQuery(String.format("delete from %s c where c.%s = :id", childClass.getSimpleName(), childClassIdField.getName())).setParameter("id", childId).executeUpdate();
                    }
                }
            }
        }
        entityManager.createQuery(String.format("delete from %s e where e.%s = :id", parentClass.getSimpleName(), idField.getName())).setParameter("id", parentId).executeUpdate();
    }
}

The methods getFirstActualTypeArgument(), getManyToOneField(), getIdField() and getOneToManyFields() in the code above are not depicted here, but do what their name sounds like. Once implemented we can easily delete all entities beginning with the root of the tree.

  • A simple example application that can be used to examine the behavior and solution described above, can be found on github.

 

Martin Mois

Martin is a Java EE enthusiast and works for an international operating company. He is interested in clean code and the software craftsmanship approach. He also strongly believes in automated testing and continuous integration.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Marcelo
Marcelo
11 years ago

What about second level cache? Are the records removed from the cache when HQL/JPQL are used?

Martin
11 years ago
Reply to  Marcelo

When you use entityManager.remove() they will be removed from the cache. In Hibernate you can also use the dedicated method session.evict() to remove an entity explicitly.

But in the use case above we do not want to load the entites into the first or second level cache before deleting them. The goal was to delete without loading them into the cache. This way you can delete hundreds of entites with one statement.

josete
josete
11 years ago

Then, why do we use JPA ? Let’s ‘rollback’ to plain old JDBC

Marco Sch
9 years ago

Sometimes, we need to step out of the hibernate frame and use the plain old sql where performance for bulk processes is important. JPA / Hibernate saves us a lot of work, but its still only a not full perfect framework which cant do all our need for us, untill yet :) Maybe in few years or month, we can write a simple hql command, like “delete from Entity.CASCADE_ALL WHERE id = x” and hibernate checks the collections of Entity and his childs for us, build a nice reverse query (to include all affected collections) and respond to the sql… Read more »

Diego Venancio
Diego Venancio
8 years ago

Hi,
When you do this:
for(Long childId : childIds) {
entityManager.createQuery(“delete from ChildEntity c where c.id = :id”).setParameter(“id”, childId).executeUpdate();
}
Aren’t you wasting two more trips on the network when you could do just one putting all ids on one IN clause? Isn’t IO and DB some of our most expensive resources on a system?

Thanks!

Back to top button