Software Development

Solving ORM – Keep the O, Drop the R, no need for the M

ORM has a simple, production-ready solution hiding in plain sight in the Java world. Let’s go through it in this post, alongside with the following topics:

  • ORM / Hibernate in 2014 – the word on the street
  • ORM is still the Vietnam of Computer Science
  • ORM has 2 main goals only
  • When does ORM make sense?
  • A simple solution for the ORM problem
  • A production-ready ORM Java-based alternative

 

ORM / Hibernate in 2014 – the word on the street

It’s been almost 20 years since ORM is around, and soon we will reach the 15th birthday of the creation of the de-facto and likely best ORM implementation in the Java world: Hibernate.

We would then expect that this is by know a well understood problem. But what are developers saying these days about Hibernate and ORM?

Let’s take some quotes from two recent posts on this topic: Thoughts on Hibernate and JPA Hibernate Alternatives:

There are performance problems related to using Hibernate.

A lot of business operations and reports involve writing complex queries. Writing them in terms of objects and maintaining them seems to be difficult.

We shouldn’t be needing a 900 page book to learn a new framework.

As Java developers we can easily relate to that: ORM frameworks tend to give cryptic error messages, the mapping is hard to do and the runtime behavior namely with lazy initialization exceptions can be surprising when first encountered.

Who hasn’t had to maintain that application that uses Open Session In View pattern that generated a flood of SQL requests that took weeks to optimize?

I believe it literally can take a couple of years to really understand Hibernate, lot’s of practice and several readings of the Java Persistence with Hibernate book (still 600 pages in it’s upcoming second edition).

Are the criticisms on Hibernate warranted?

I personally don’t think so, in fact most developers really criticize the complexity of the object-relational mapping approach itself, and not a concrete ORM implementation of it in a given language.

This sentiment seems to come and go in periodic waves, maybe when a newer generation of developers hits the labor force. After hours and days trying to do what feels it should be much simpler, it’s only a natural feeling.

The fact is that there is a problem: why do many projects spend 30% of their time developing the persistence layer still today?

ORM is the Vietnam of Computer Science

The problem is that the ORM problem is complex, and there are no good solutions. Any solution to it is a huge compromise.

ORM has been famously named almost 10 years ago the Vietnam of Computer Science, in a blog post from one of the creators of Stackoverflow, Jeff Atwood.

The problems of ORM are well known and we won’t go through them in detail here, here is a summary from Martin Fowler on why ORM is hard:

  • object identity vs database identity
  • How to map object oriented inheritance in the relational world
  • unidirectional associations in the database vs bi-directional in the OO world
  • Data navigation – lazy loading, eager fetching
  • Database transactions vs no rollbacks in the OO world

This is just to name the main obstacles. The problem is also that it’s easy to forget what we are trying to achieve in the first place.

ORM has 2 main goals only

ORM has two main goals clearly defined:

  • map objects from the OO world into tables in a relational database
  • provide a runtime mechanism for keeping an in-memory graph of objects and a set of database tables in sync

Given this, when should we use Hibernate and ORM in general?

When does ORM make sense ?

ORM makes sense when the project at hand is being done using a Domain Driven Development approach, where the whole program is built around a set of core classes called the domain model, that represent concepts in the real world such as Customer, Invoice, etc.

If the project does not have a minimum threshold complexity that needs DDD, then an ORM can likely be overkill. The problem is that even the most simple of enterprise applications are well above this threshold, so ORM really pulls it’s weight most of the time.

It’s just that ORM is hard to learn and full of pitfalls. So how can we tackle this problem?

A simple solution for the ORM problem

Someone once said something like this:

A smart man solves a problem, but a wise man avoids it.

As often happens in programming, we can find the solution by going back to the beginning and see what we are trying to solve:

sketch

So we are trying to synchronize an in-memory graph of objects with a set of tables. But these are two completely different types of data structures!

But which data structure is the most generic? It turns out that the graph is the most generic one of the two: actually a set of linked database tables is really just a special type of graph.

The same can be said of basically almost any other data structure.

Graphs and their traversal are very well understood and have a body of knowledge of decades available, similar to the theory on which relational databases are built upon: Relational Algebra.

Solving the impedance mismatch

The logical conclusion is that the solution for the ORM impedance mismatch is removing to remove the mismatch itself:

Let’s store the graph of in-memory domain objects in a transactional-capable graph database!

This solves the mapping problem, by removing the need for mapping in the first place.

A production-ready solution for the ORM problem

This is easier said than done, or is it? It turns out that graph databases have been around for years, and the prime example in the Java community is Neo4j.

Neo4j is a stable and mature product that is well understood and documented, see the Neo4J in Action book. It can used as an external server or in embedded mode inside the Java process itself.

But it’s core API is all about graphs and nodes, something like this:

GraphDatabaseService gds = new EmbeddedGraphDatabase("/path/to/store");  
Node forrest=gds.createNode();  
forrest.setProperty("title","Forrest Gump");  
forrest.setProperty("year",1994);  
gds.index().forNodes("movies").add(forrest,"id",1);

Node tom=gds.createNode();

The problem is that this is too far from domain driven development, writing to this would be like coding JDBC by hand.

This is the typical task of a framework like Hibernate, with the big difference that because the impedance mismatch is minimal such framework can operate in a much more transparent and less intrusive way.

It turns out that such framework is already written.

Spring support for Neo4J

One of the creators of the Spring framework Rod Johnson took the task of implementing himself the initial version of the Neo4j integration, the Spring Data Neo4j project.

This is an important extract from the foreword of Rod Johnson in the documentation concerning the design of the framework:

Its use of AspectJ to eliminate persistence code from your domain model is truly innovative, and on the cutting edge of today’s Java technologies.

So Spring Data Neo4J is a AOP-based framework that wraps domain objects in a relatively transparent way, and synchronizes a in-memory graph of objects with a Neo4j transactional data store.

It’s aimed to write the persistence layer of the application in a simplified way, similar to Spring Data JPA.

How does the mapping to a graph database look like

It turns out that there is limited mapping needed (tutorial). We need for one to mark which classes we want to make persistent, and define a field that will act as an Id:

@NodeEntity
class Movie {  
    @GraphId Long nodeId;
    String id;
    String title;
    int year;
    Set cast;
}

There are other annotations (5 more per the docs) for example for defining indexing and relationships with properties, etc. Compared with Hibernate there is only a fraction of the annotations for the same domain model.

What does the query language look like?

The recommended query language is Cypher, that is an ASCII art based language. A query can look for example like this:

// returns users who rated a movie based on movie title (movieTitle parameter) higher than rating (rating parameter)
    @Query("start movie=node:Movie(title={0}) " +
           "match (movie)<-[r:RATED]-(user) " +
           "where r.stars > {1} " +
           "return user")
     Iterable getUsersWhoRatedMovieFromTitle(String movieTitle, Integer rating);

This is a query language called Cypher, which is based on ASCII art. The query language is very different from JPQL or SQL and implies a learning curve.

Still after the learning curve this language allows to write performant queries that usually can be problematic in relational databases.

Performance of Queries in Graph vs Relational databases

Let’s compare some frequent query types and how they should perform in a graph vs relational databases:

  • lookup by Id: This is implemented for example by doing a binary search on an index tree, finding a match and following a ‘pointer’ to the result. This is a (very) simplified description, but it’s likely identical for both databases. There is no apparent reason why such query would take more time in a graph database than in a relational DB.
  • lookup parent relations: This is the type of query that relational databases struggle. Self-joins might result in cartesian products of huge tables, bringing the database to an halt. A graph database can perform those queries in a fraction of that.
  • lookup by non-indexed column: Here the relational database can scan tables faster due to the physical structure of the table and the fact than one read usually brings along multiple rows. But this type of queries (table scans) are to be avoided in relational databases anyway.

There is more to say here, but there is no indication (no readilly-available DDD-related public benchmarks) that a graph-based data store would not be appropriate for doing DDD due to query performance.

Conclusions

I personally cannot find any (conceptual) reasons why a transaction-capable graph database would not be an ideal fit for doing Domain Driven Development, as an alternative to a relational database and ORM.

No data store will ever fit perfectly every use case, but we can ask the question if graph databases shouldn’t become the default for DDD, and relational the exception.

The disappearance of ORM would imply a great reduction of the complexity and the time that it takes to implement a project.

The future of DDD in the enterprise

The removal of the impedance mismatch and the improved performance of certain query types could be the killer features that drive the adoption of a graph based DDD solution.

We can see practical obstacles: operations prefer relational databases, vendor contract lock-in, having to learn a new query language, limited expertise in the labor market, etc.

But the economic advantage is there, and the technology is there also. And when that is case, it’s usually only a matter of time.

What about you, could you think of any reason why Graph-based DDD would not work? Feel free to chime in on the comments bellow.

Aleksey Novik

Software developer, Likes to learn new technologies, hang out on stackoverflow and blog on tips and tricks on Java/Javascript polyglot enterprise development.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Glenn
Glenn
10 years ago

Surely you recognize that you are suggesting a different sort of ORM?

jhadesdev
10 years ago
Reply to  Glenn

Hello, yes but much easier to use than an ORM and with increased performance for several common query types. It’s an OGM, like for example Hibernate OGM or Spring Data Neo4j.

Racket
Racket
10 years ago

The concept is sound but we tested lookup by id on neo4j vs MySQL and MySQL was faster. We also tested up to 6 joins vs the equivalent in neo4j and MySQL was still faster (both cold and warm cache).

jhadesdev
10 years ago
Reply to  Racket

Hello Racket, those are very interesting results. Do you have an idea what where the differences in case of the lookup by id and the 6-level joins?

There are benchmarks around but it’s at this moment hard to come to a conclusion. For the lookup by Id, it’s probably due to differences in the implementations. but the result for the 6 level joins is more surprising. It probably depends on how much of the total data set is being joined?

If you have more details at hand on the results it would be great.

Don
Don
10 years ago
Reply to  Racket

Did you tried the same benchmark with OrientDB? We did it in the past and OrientDB was much faster in both cold and warm caches.

Robert Friberg
10 years ago

Or if the data fits in RAM you could go with an in-memory object graph with write-ahead command logging, for instance using prevayler or origodb.

Jeff Nemecek
Jeff Nemecek
10 years ago

ORM is only hard if you try approach it from a “lazy” perspective. That is, if you try to create 1:1 bindings between an object model and a relational model. This is the great failing of ORM. By trying to devise a direct binding (as Hibernate does) between object and relational data models, you compromise both, benefiting neither at-scale (anything works in micro-scale). Graph databases are a great storage platform for certain structures, but are not a replacement for relational storage, which is also true in reverse. Shoe-horning relational functions into a graph database is as sinful as shoe-horning event… Read more »

Tom
Tom
9 years ago
Reply to  Jeff Nemecek

The problem with ORM was lightly addressed by the author. It’s not the technology, mapping, performance or learning curve – it’s the DBA. In most data centric organization the DBA is the final rule. Everything flows according to the DBA. If they can’t see a query in advance, test it, optimize it and approve it – it won’t be. Hibernate doesn’t work like this. You can’t submit a mapped entity or an HBM file to your DBA for approval. Other than that, the emmence overhead of redundant queries ORM frameworks produce for loading lazy relationships, refreshing the cache, updating keys… Read more »

Rusi Popov
Rusi Popov
9 years ago

Strange, nobody comments JEE and Entity EJB. In our company the legacy application uses a strange data base organization, which is closer to a graph database over an RDBMS. The on-line reporting on its database is extremely complex, whereas extending it is very complex. In 2004 this made us build another application. It uses standard Entity EJB 2.1 with Domain Classes, following a model-centric approach and Domain Driven Design. Additionally we use OLAP through JDBC. This works. It is matter of a good design of the architecture and the framework. EJB 2.1 in WebLogic really well maintains the relationships and… Read more »

Back to top button