Nested Elastic Explorations – Part 2

Reusing a properly modelled domain for storing data in Elasticsearch does not work well out of the box. Let’s examine a problem scenario. Consider this mini-domain:

Mini Bieb Domain

This ties in with my last post, where I mentioned that loops are a pain for serializing to json. Here’s the loop, visualized:

Mini Bieb Domain Loop

The problem is that NewtonSoft (used under the hood by Nest) will start serializing “The Greatest Book”, and recurses through all properties. In the end it’ll try to serialize “The Greatest Book” again as part of “Richard Roe”‘s AuthoredBooks property.

Breaking this serialization loop is actually pretty simple with NewtonSoft, and since a while you can inject the appropriate NewtonSoft setting in Nest as well. Something like this:

Problem solved, right? Not so much. Here’s why. Suppose I use the LoopHandling “fix” and load up the mini-domain with this integration test:

This will create a document in Elasticsearch of a whopping 71 KB / 1364 lines, see this example JSON file. Not so good.

The simple solution which would do for now would be to index only Book items, and all related people (authors, editors, translators), but not those people’s Books (AuthoredBooks, etc). We somehow need to let Nest and Elasticsearch know that we want to stop recursion right there. The question is how to be explicit about how they should map my domain objects to documents. I see two courses of action I like:

  1. Declarative mapping, with Attributes. This would (to my taste) require separate DTO classes to represent the documents in Elasticsearch, and have an explicit transformation between those DTO’s and my Domain objects. (I wouldn’t like to litter my domain object classes with persistence-specific attributes.)
  2. Mapping by code. This would seemingly allow me to keep using domain object classes for persistance, having the “Mappings” in code as a strategy for the transformation in separate files. At this point though I’m unsure if this approach will “hold up” once you start adding more complex properties and logic to domain objects.

I lean towards option 1, even though it feels like it’ll be more work. Guess there’s only one way to find out…

Nested Elastic Explorations – Part 1

After my previous post on what to explore next I’ve dabbled with option 4 “Linux” (but got frustrated so I set it aside), worked some on option 2 “More Gulp!” (bu that was mostly at work). However, I think I’ve settled for now with putting some effort in learning Elasticsearch and Nest. So let’s start a series of posts on this venture. (No promises though, this “Part 1” post might end up being the only part…)

About this venture…

Elasticsearch can be seen as a NoSQL database, more specifically as a document database. I’ll be using my Bieb pet project (or at least its domain) for testing its features. Bieb’s domain should be familiar to everyone: books!

Specifically I’m writing this post because I quickly ran into a fundamental challenge with document databases, and I hope to gather my thoughts on the matter by writing about it. To set the stage, I’ll be talking about this part of the domain:

  • A Book has multiple Authors (of type Person).
  • A Person has authored mulitple Books.

The classic many-to-many relationship example. The challenge then obviously is how to express this in the various tools.

Setting the coding stage…

Before I dive into the Elasticsearch bit, let me first describe how I got this going in C#, SQL, and NHibernate. First, the SQL bit:

Run-of-the-mill stuff, with a many-to-many table. Obviously, we don’t want to see BookAuthor show up as a class in our code. That is, we want this kind of C#:

As you can see, the Book is the “main” entity and the “boss” of the relationship with authors. Apart from the virtual keyword everywhere, little to no SQL or NHibernate know-how leaked into this domain.

The NHibernate mapping looks like this:

The important bit, and the link to Elasticsearch, is the inverse="true" bit. With NHibernate, you have to indicate which entity is “in charge” of updating the many-to-many table. With document databases you have to do the same thing.

Moving to Nest & Elasticsearch…

Diving head first into throwing these entities into Elasticsearch, I tried to run the following Nest integration test:

It’ll fail, because Nest internally uses JSON.Net to serialize the Book, which means you’ll get this:

Newtonsoft.Json.JsonSerializationException : Self referencing loop detected with type ‘Bieb.Domain.Entities.Book’. Path ‘allAuthors[0].authoredBooks’.

A problem that was to be expected.

With SQL self-referencing loops are just fine, handled by the many-to-many table. In NHibernate it’s a minor nuisance, simply handled by the inverse="true" bit. However, with document databases this is somewhat less trivial, in my opinion. In fact, it may be the hardest part of using them: how do you design your documents?

What I’d like to do…

The solution obviously is to break the self-referencing loop somewhere. Simply not serializing a Person’s authored books should do the trick. (On a side note, it occurs to me that graph databases would not need this trick, and would be great for a scenario with endless Book-Person-Book-… traversal, but that’s for another time.)

However, I don’t want to hard-code the loop-break into my domain object, e.g. by marking AuthoredBooks as not-serializable. The first and foremost reason is that this would mean my storage layer leaks into my domain layer. The second reason is that I may want to break the loop at different places depending on the Elasticsearch index I’m targeting. That is, I’ll probably have an index for Authors as well as Books, and want to break the loop in different places on either occasion.

Questions…

So some questions remain. How could we do this with JSON.Net without altering the domain layer? Or is there a way to do this with Elasticsearch + Nest? Or should we bite the bullet and have a seperate set of DTO’s to/from our domain entities to represent how they’re persisted in Elasticsearch? Or do we need a different approach alltogether?

Good questions. Time to go and find out!

A New Hope

Did I mention before that I write here to collect my thoughts and as writing practice? Well, in any case, after my previous monster post, and given that it’s the start of a new year, I figured I’d do a “New Beginnings” post. Then the sci-fi addict in me awoke and typed “A New Hope“, even though I’m a Trekkie at heart. But I digress.

I wanted to summarize which technologies and experiments I’m most interested in pursuing next. Let me try to list them in order of current preference.

1. Elasticsearch …. and Nest?

Elastic as a (primary?) data store intrigues me. It feels like a second chance for Lucene towards me (experimenting with Solr didn’t “click” for me). There recently was a major version released so it seems like a great time to start investigating. I’m hesitant about Nest though, which you’d kind-of have to use if you want to access Elastic with .NET: during my first few experiments I felt like I was “wrestling” the API.

2. Gulp, or: More Gulp!

I’ve done quite some work with Gulp recently, and it was a very (and surprisingly) fun experience! In the past, most devops tasks felt like chores to me, but not so much with this Gulp. So this point would actually be “More Gulp” (as I’ve done quite a bit of it recently already), but I still have quite a list of things I want to try out, like for instance setting up JSCS.

3. Node

Yes, yes, I know: I’d be a hipster-after-the-fact if I’d start on Node now. Even so, I enjoy writing Javascript, and Node would be a way to do even more of it in my pet projects. Not sure how I would be utilizing Node yet: with ExpressJS, for custom (Gulp) packages, general scripting, or perhaps something like project Euler.

4. Linux

This one’s been on my mind for a while now: perhaps I’ve been stuck too much in my Windows-comfort-zone. It would be good to shape up my base knowledge of Linux as a development environment.

5. Grunt

Why yes, in addition to Gulp there’s also Grunt on this list. Gotta know what’s on the other side of the fence to be able to compare, right?


I’m placing a short but distinct separation here, because these are currently mostly “in reserve” kind of ideas.


6. Couchbase

Working a little with this at work as well. There’s a few things I’d be excited about to try at home, mostly features from version 3 and 4. Would need a “subject” though to make it interesting…

7. Study three-way-diff-tools

For some reason I never got my mind wrapped around these tools. And that’s frustrating. I should just spend a few hours trying to understand the details of these tools… I guess.

8. Raspberry / something IoTish

This one’s on the list because it nags me that I don’t see what all the rage with IoT is (or is it “was”?) about, and for that reason I feel like I have to “make Slackbot call a webhook that calls my webserver that calls my Raspberry which turns on a LED strip and autopilots a drone around in my room”. Or something.

9. Android app

I’m curious how the Android-dev-experience compares to my short experiments from around 2011. This would also allow me to revisit Java. But on a whole I feel I’d have to invest a lot of time or not do this one at all.

10. Closure Compiler

It’s on the list because I find the idea of it intrinsically interesting. It’s low on the list because I’m not sure if I work on things personally or professionally at the moment where this would be worth the investment.


Wow, that worked like a charm! Apologies for the stream-of-consciousness post, but now I do know what to do next! And I’m not even going back to adjust the (order of the) list accordingly, I’m just gonna leave you all hanging out there, guessing at what’s next…