Nested Elastic Explorations – Part 2

Reusing a properly modelled domain for storing data in Elasticsearch does not work well out of the box. Let’s examine a problem scenario. Consider this mini-domain:

Mini Bieb Domain

This ties in with my last post, where I mentioned that loops are a pain for serializing to json. Here’s the loop, visualized:

Mini Bieb Domain Loop

The problem is that NewtonSoft (used under the hood by Nest) will start serializing “The Greatest Book”, and recurses through all properties. In the end it’ll try to serialize “The Greatest Book” again as part of “Richard Roe”‘s AuthoredBooks property.

Breaking this serialization loop is actually pretty simple with NewtonSoft, and since a while you can inject the appropriate NewtonSoft setting in Nest as well. Something like this:

Problem solved, right? Not so much. Here’s why. Suppose I use the LoopHandling “fix” and load up the mini-domain with this integration test:

This will create a document in Elasticsearch of a whopping 71 KB / 1364 lines, see this example JSON file. Not so good.

The simple solution which would do for now would be to index only Book items, and all related people (authors, editors, translators), but not those people’s Books (AuthoredBooks, etc). We somehow need to let Nest and Elasticsearch know that we want to stop recursion right there. The question is how to be explicit about how they should map my domain objects to documents. I see two courses of action I like:

  1. Declarative mapping, with Attributes. This would (to my taste) require separate DTO classes to represent the documents in Elasticsearch, and have an explicit transformation between those DTO’s and my Domain objects. (I wouldn’t like to litter my domain object classes with persistence-specific attributes.)
  2. Mapping by code. This would seemingly allow me to keep using domain object classes for persistance, having the “Mappings” in code as a strategy for the transformation in separate files. At this point though I’m unsure if this approach will “hold up” once you start adding more complex properties and logic to domain objects.

I lean towards option 1, even though it feels like it’ll be more work. Guess there’s only one way to find out…