Nested Elastic Explorations – Part 1

After my previous post on what to explore next I’ve dabbled with option 4 “Linux” (but got frustrated so I set it aside), worked some on option 2 “More Gulp!” (bu that was mostly at work). However, I think I’ve settled for now with putting some effort in learning Elasticsearch and Nest. So let’s start a series of posts on this venture. (No promises though, this “Part 1” post might end up being the only part…)

About this venture…

Elasticsearch can be seen as a NoSQL database, more specifically as a document database. I’ll be using my Bieb pet project (or at least its domain) for testing its features. Bieb’s domain should be familiar to everyone: books!

Specifically I’m writing this post because I quickly ran into a fundamental challenge with document databases, and I hope to gather my thoughts on the matter by writing about it. To set the stage, I’ll be talking about this part of the domain:

  • A Book has multiple Authors (of type Person).
  • A Person has authored mulitple Books.

The classic many-to-many relationship example. The challenge then obviously is how to express this in the various tools.

Setting the coding stage…

Before I dive into the Elasticsearch bit, let me first describe how I got this going in C#, SQL, and NHibernate. First, the SQL bit:

Run-of-the-mill stuff, with a many-to-many table. Obviously, we don’t want to see BookAuthor show up as a class in our code. That is, we want this kind of C#:

As you can see, the Book is the “main” entity and the “boss” of the relationship with authors. Apart from the virtual keyword everywhere, little to no SQL or NHibernate know-how leaked into this domain.

The NHibernate mapping looks like this:

The important bit, and the link to Elasticsearch, is the inverse="true" bit. With NHibernate, you have to indicate which entity is “in charge” of updating the many-to-many table. With document databases you have to do the same thing.

Moving to Nest & Elasticsearch…

Diving head first into throwing these entities into Elasticsearch, I tried to run the following Nest integration test:

It’ll fail, because Nest internally uses JSON.Net to serialize the Book, which means you’ll get this:

Newtonsoft.Json.JsonSerializationException : Self referencing loop detected with type ‘Bieb.Domain.Entities.Book’. Path ‘allAuthors[0].authoredBooks’.

A problem that was to be expected.

With SQL self-referencing loops are just fine, handled by the many-to-many table. In NHibernate it’s a minor nuisance, simply handled by the inverse="true" bit. However, with document databases this is somewhat less trivial, in my opinion. In fact, it may be the hardest part of using them: how do you design your documents?

What I’d like to do…

The solution obviously is to break the self-referencing loop somewhere. Simply not serializing a Person’s authored books should do the trick. (On a side note, it occurs to me that graph databases would not need this trick, and would be great for a scenario with endless Book-Person-Book-… traversal, but that’s for another time.)

However, I don’t want to hard-code the loop-break into my domain object, e.g. by marking AuthoredBooks as not-serializable. The first and foremost reason is that this would mean my storage layer leaks into my domain layer. The second reason is that I may want to break the loop at different places depending on the Elasticsearch index I’m targeting. That is, I’ll probably have an index for Authors as well as Books, and want to break the loop in different places on either occasion.

Questions…

So some questions remain. How could we do this with JSON.Net without altering the domain layer? Or is there a way to do this with Elasticsearch + Nest? Or should we bite the bullet and have a seperate set of DTO’s to/from our domain entities to represent how they’re persisted in Elasticsearch? Or do we need a different approach alltogether?

Good questions. Time to go and find out!