Entity Framework: Cascading Delete of Optional Related Entity

After several years of NHibernate, and a couple of years Dapper and NoSQL, I’m now working on a project that uses Entity Framework as its ORM. It’s a mature ORM at this point. However, it does give me a headache as I’m struggling to find ways to do certain things. I’ll admit that I’m trying to dive in without sitting down and doing a few hours of learning first; for sure I’ll be on Pluralsight some time soon.

Fair warning: what comes next is a highly specific, rather technical, and possibly stupidly uninformed dump of a problem I ran into. Feel free to skip this one: no hard feelings, and I’ll see you on my next post!

The main problem: I’ve got a repro, but it’s extremely similar to various other questions on Stack Overflow. Except… that my code contains the accepted (and often highly upvoted) answers’ code as well, but still doesn’t work as advertised!

What I’m trying to do is make sure that EF will delete a “Child” property (i.e. the database row) automatically when its “Parent” is explicitly deleted via the DbContext. Note that the Child is optional in my scenario.

Here’s the repro. Create a new class library and install EntityFramework (I used 6.1.3) and NUnit (I used 3.6.1). Then drop in one namespace these entities:

And then this DbContext:

And finally this TestFixture:

Make sure the “TestDb” database is available on your “(LocalDb)\cascades” instance (or use another Sql Server instance and database). Then run the tests, and you’ll get:

Test Failed – Delete_will_cascade.
Message: “Child”.
Expected: 0, but was: 1.

All I want is that a “Child” is deleted with its “Parent” once that is deleted. I know I can use a Sql CASCADE, and I know I could manually remove the Child from the context, but I want EF to handle this automatically, damnit!

For sure I’ve missed the obvious solution. I was somewhat hoping it would come to me while writing this. But perhaps I just need a good night’s rest.


References:

Nested Elastic Explorations – Part 1

After my previous post on what to explore next I’ve dabbled with option 4 “Linux” (but got frustrated so I set it aside), worked some on option 2 “More Gulp!” (bu that was mostly at work). However, I think I’ve settled for now with putting some effort in learning Elasticsearch and Nest. So let’s start a series of posts on this venture. (No promises though, this “Part 1” post might end up being the only part…)

About this venture…

Elasticsearch can be seen as a NoSQL database, more specifically as a document database. I’ll be using my Bieb pet project (or at least its domain) for testing its features. Bieb’s domain should be familiar to everyone: books!

Specifically I’m writing this post because I quickly ran into a fundamental challenge with document databases, and I hope to gather my thoughts on the matter by writing about it. To set the stage, I’ll be talking about this part of the domain:

  • A Book has multiple Authors (of type Person).
  • A Person has authored mulitple Books.

The classic many-to-many relationship example. The challenge then obviously is how to express this in the various tools.

Setting the coding stage…

Before I dive into the Elasticsearch bit, let me first describe how I got this going in C#, SQL, and NHibernate. First, the SQL bit:

Run-of-the-mill stuff, with a many-to-many table. Obviously, we don’t want to see BookAuthor show up as a class in our code. That is, we want this kind of C#:

As you can see, the Book is the “main” entity and the “boss” of the relationship with authors. Apart from the virtual keyword everywhere, little to no SQL or NHibernate know-how leaked into this domain.

The NHibernate mapping looks like this:

The important bit, and the link to Elasticsearch, is the inverse="true" bit. With NHibernate, you have to indicate which entity is “in charge” of updating the many-to-many table. With document databases you have to do the same thing.

Moving to Nest & Elasticsearch…

Diving head first into throwing these entities into Elasticsearch, I tried to run the following Nest integration test:

It’ll fail, because Nest internally uses JSON.Net to serialize the Book, which means you’ll get this:

Newtonsoft.Json.JsonSerializationException : Self referencing loop detected with type ‘Bieb.Domain.Entities.Book’. Path ‘allAuthors[0].authoredBooks’.

A problem that was to be expected.

With SQL self-referencing loops are just fine, handled by the many-to-many table. In NHibernate it’s a minor nuisance, simply handled by the inverse="true" bit. However, with document databases this is somewhat less trivial, in my opinion. In fact, it may be the hardest part of using them: how do you design your documents?

What I’d like to do…

The solution obviously is to break the self-referencing loop somewhere. Simply not serializing a Person’s authored books should do the trick. (On a side note, it occurs to me that graph databases would not need this trick, and would be great for a scenario with endless Book-Person-Book-… traversal, but that’s for another time.)

However, I don’t want to hard-code the loop-break into my domain object, e.g. by marking AuthoredBooks as not-serializable. The first and foremost reason is that this would mean my storage layer leaks into my domain layer. The second reason is that I may want to break the loop at different places depending on the Elasticsearch index I’m targeting. That is, I’ll probably have an index for Authors as well as Books, and want to break the loop in different places on either occasion.

Questions…

So some questions remain. How could we do this with JSON.Net without altering the domain layer? Or is there a way to do this with Elasticsearch + Nest? Or should we bite the bullet and have a seperate set of DTO’s to/from our domain entities to represent how they’re persisted in Elasticsearch? Or do we need a different approach alltogether?

Good questions. Time to go and find out!

All roads lead to Excel, even those from SQL

My first employer provided me with some valuable insight:

Microsoft Excel is the main competition for any piece of software.

Over the years this statement has proven true an alarming number of times. And it makes sense too. Everyone knows how to use Excel, and it’s extremely flexible, especially when you’re working with any kind of (tabular) data. In other words: all roads lead to Excel.

Roads from SQL also often lead to Excel, even though they’re not always pretty. Sure, if you’re on foot or horseback, with a limited amount of luggage, the road will be fine. However, here’s a particular scenario that obscures the path.

Scenario

These are the basic constraints:

  • Available tools: MSSQL 2012, SSRS, SSIS, Visual Studio 2012 & .NET 4.5.
  • Excel versions: either XLS (2003 and below) or XLSX (2007 and up, slightly preferred) will do.
  • Form of data: combination of normalized and denormalized data (see below).
  • Amount of data: tops 250.000 rows (times 20 when unpivoted).
  • Required response time: live exports that should run within seconds.
  • Databases: many instances each with the exact same schema but different data.

So there’s access to the latest and greatest Microsoft tools, and the option to include custom components. Free and open source components are preferred, but buying tools and components is also an option.

Data

Here’s a simplified version of how the data is modeled:

  • Person is a “flat” table, containing some columns that have “fixed” personal details.
  • Property and Value allow for custom (normalized) Person fields.

Here’s a visual of this simplified model:

Database model for Person, Property, and Value

 

You can view the SqlFiddle with some sample data. A typical query to get the data that’s going to be our starting point:

This will give output similar to this:

Id FirstName Surname Id PropName Id ValName CustomValue
1 John Doe 1 Trait 2 Bold NULL
1 John Doe 1 Trait 3 Easygoing NULL
1 John Doe 2 Eye color 4 Green NULL
1 John Doe 3 Pet name 7 Placeholder Fluffybunny
2 Mark Roe 1 Trait 3 Easygoing NULL
3 Mary Fisch 2 Eye color 6 Other… Red-brown-ish
3 Mary Fisch 3 Pet name 7 Placeholder Ravi

Note that in reality I’m dealing with these figures:

  • About 30 columns on the Person table;
  • About 20 different Properties with about 6 possible Values on average;
  • Anywhere between 100 and 250.000 Persons;
  • Usually between 0 and 2 Values per Person per Property;

For one, this means that the normal output of mentioned query has a lot of redundant information (i.e. the 30-ish Person columns).

Target Output

The business requirement here when moving this data to Excel should be obvious: the data should be pivoted. Each “Property” should become a group of columns, with one column per “Value”. A table says more than a thousand words; this is the requested output:

Trait Eye Color Pet name
Id FirstName Surname Bold Easygoing Green Other… Placeholder
1 John Doe x x x Fluffybunny
2 Mark Roe x
3 Mary Fisch Red-brown-ish Ravi

Something along these lines is what the business user’s would like to see.

Bonus Objectives

Getting the target output in itself is a challenge. I’m not done yet though, here are some bonus objectives we have (with MoSCoW indications):

  • Properties and Values both have ordering, the order of columns Should respect that.
  • Any solution Should allow for some styling (fonts, borders, backgrounds. It’d be Nice to have further control, for example enable a theme, alternate row coloring, etc.
  • I Would like to have a place for metadata (date exported, etc) somewhere in the generated file.
  • Localization of the column headers (where applicable) would be Nice to have.
  • It’d be Nice to be able to reuse much of the solution in generating XML files instead of Excel sheets.
  • Any solution Must be solid and maintainable.
  • Any solution Must run on moderate hardware without hogging resources.

Current Solution

Right now, the above is accomplished using Reporting Services. This works decently well for datasets containing no more than a few thousand Person rows with related Property Values.

However, beyond about 3000 records performance quickly starts to degrade. This isn’t entirely unexpected, because Reporting Services isn’t really meant for this task (it’s much better at showing aggregates than at exporting large volumes of data).

Possible Solutions

There are many possible solutions. I’m currently considering a subset of them (some solutions merely for “benchmark” purposes):

  • SSIS packages. The tool seems meant for the job. I do hold a grudge against the tool, but maybe it’s time to get over that.
  • Dynamic SQL + generated RDLs. Use DynSQL to do the pivoting. This requires generated RDL files because the fields of a query must be known up fron to SSRS.
  • Dynamic SQL + OPENROWSET + OleDB. Use DynSQL to do the pivoting, and export it straight to Excel using OleDB.
  • FOR XML queries into OpenXML. The basic idea: fast FOR XML queries, possibly an XSLT, generating OpenXML data, and plug it in a basic XLSX.
  • ORM or ADO.NET into an OpenXML using an XmlWriter. Something along these lines.
  • BCP directly into Excel files. Haven’t looked into this yet, but it may be an option.
  • SQL CLR. Not sure how this would work (if at all), but there might be options here.

Now it’s time for me to try some of them, see which one fits the requirements best.

Pivot vs Unpivot

When mnemonics and learning by heart fail: write a blog post! So here is my own personal “which is which” reference post about the Pivot and Unpivot operations. First things first:

 

Example of the pivot and unpivot operation

 

Using the great Scipio Africanus and Hannibal as puppets, the above example shows the two operations in their most basic form. The pivot operation would be along these lines in T-SQL:

The query above is also available as a SQL Fiddle, to toy around with it some more, as there are many more complex queries possible with these operations.

Enjoy!