sql – Meta For You

Entity Framework: Cascading Delete of Optional Related Entity

2017-03-202017-03-20 Jeroen

After several years of NHibernate, and a couple of years Dapper and NoSQL, I’m now working on a project that uses Entity Framework as its ORM. It’s a mature ORM at this point. However, it does give me a headache as I’m struggling to find ways to do certain things. I’ll admit that I’m trying to dive in without sitting down and doing a few hours of learning first; for sure I’ll be on Pluralsight some time soon.

Fair warning: what comes next is a highly specific, rather technical, and possibly stupidly uninformed dump of a problem I ran into. Feel free to skip this one: no hard feelings, and I’ll see you on my next post!

The main problem: I’ve got a repro, but it’s extremely similar to various other questions on Stack Overflow. Except… that my code contains the accepted (and often highly upvoted) answers’ code as well, but still doesn’t work as advertised!

What I’m trying to do is make sure that EF will delete a “Child” property (i.e. the database row) automatically when its “Parent” is explicitly deleted via the DbContext. Note that the Child is optional in my scenario.

Here’s the repro. Create a new class library and install EntityFramework (I used 6.1.3) and NUnit (I used 3.6.1). Then drop in one namespace these entities:

public class Parent

{

[Key, DatabaseGenerated(DatabaseGeneratedOption.Identity)]

public int Id { get; set; }

public virtual Child Child { get; set; }

}

public class Child

{

[Key, DatabaseGenerated(DatabaseGeneratedOption.Identity)]

public int Id { get; set; }

}

And then this DbContext:

public class MyDbContext : DbContext

{

public MyDbContext(string nameOrConnectionString)

: base(nameOrConnectionString)

{

Database.SetInitializer(null);

}

protected override void OnModelCreating(DbModelBuilder modelBuilder)

{

modelBuilder.Entity()

.HasOptional(p => p.Child)

.WithOptionalDependent()

.WillCascadeOnDelete(true);

}

public virtual DbSet Parents { get; set; }

public virtual DbSet Children { get; set; } // Actually optional, it seems.

}

And finally this TestFixture:

[TestFixture]

public class Tests

{

const string connectionString = @"Data Source=(LocalDb)\cascades;Integrated Security=True;Initial Catalog=TestDb";

[OneTimeSetUp]

public void OneTimeSetUp()

{

ExecuteNonQuery(@"

IF OBJECT_ID('Parents') IS NOT NULL EXEC('DROP TABLE Parents;');

IF OBJECT_ID('Children') IS NOT NULL EXEC('DROP TABLE Children;');

CREATE TABLE Children (

Id INT IDENTITY(1,1) PRIMARY KEY

);

CREATE TABLE Parents (

Id INT IDENTITY(1,1) PRIMARY KEY,

Child_Id INT,

CONSTRAINT FK_Parent_Child FOREIGN KEY (Child_Id) REFERENCES Children(Id)

);

");

}

[SetUp]

public void SetUp()

{

ExecuteNonQuery(@"DELETE FROM Parents;");

ExecuteNonQuery(@"DELETE FROM Children;");

}

[Test]

public void Delete_will_cascade()

{

using (var context = new MyDbContext(connectionString))

{

context.Parents.Add(new Parent { Child = new Child() });

context.SaveChanges();

var loadedParent = context.Parents.Include(p => p.Child).Single();

context.Parents.Remove(loadedParent);

context.SaveChanges();

Assert.That(context.Database.SqlQuery(@"SELECT COUNT(*) FROM Parents;").Single(), Is.Zero, "Parent");

Assert.That(context.Database.SqlQuery(@"SELECT COUNT(*) FROM Children;").Single(), Is.Zero, "Child");

}

private static void ExecuteNonQuery(string ddl)

{

using (var conn = new SqlConnection(connectionString))

using (var cmd = new SqlCommand(ddl, conn))

{

conn.Open();

cmd.ExecuteNonQuery();

}

Make sure the “TestDb” database is available on your “(LocalDb)\cascades” instance (or use another Sql Server instance and database). Then run the tests, and you’ll get:

Test Failed – Delete_will_cascade.
Message: “Child”.
Expected: 0, but was: 1.

All I want is that a “Child” is deleted with its “Parent” once that is deleted. I know I can use a Sql CASCADE, and I know I could manually remove the Child from the context, but I want EF to handle this automatically, damnit!

For sure I’ve missed the obvious solution. I was somewhat hoping it would come to me while writing this. But perhaps I just need a good night’s rest.

References:

Exactly my issue AFAICT: http://stackoverflow.com/questions/17487577/entity-framework-ef-code-first-cascade-delete-for-one-to-zero-or-one-relations
Probably also exactly the same: http://stackoverflow.com/questions/22815693/how-do-you-set-entity-framework-to-cascade-on-delete-with-optional-foreign-key
Probably very similar or even the same: http://stackoverflow.com/questions/5471374/how-do-you-ensure-cascade-delete-is-enabled-on-a-table-relationship-in-ef-code-f
Similar, but with collections: http://stackoverflow.com/questions/16654828/how-to-remove-child-one-to-many-related-records-in-ef-code-first-database
Another one about collections: http://stackoverflow.com/questions/15226312/entityframewok-how-to-configure-cascade-delete-to-nullify-foreign-keys
A three year old (so I’d like to assume no longer relevant?) post with exactly my frustration, and to my surprise a slightly hackish workaround: https://lostechies.com/jimmybogard/2014/05/08/missing-ef-feature-workarounds-cascade-delete-orphans/ – as an afterthought, I had not considered the possibility that EF doesn’t have the feature I’m after, but perhaps I’m just wrong.

Nested Elastic Explorations – Part 1

2016-02-142016-02-14 Jeroen

After my previous post on what to explore next I’ve dabbled with option 4 “Linux” (but got frustrated so I set it aside), worked some on option 2 “More Gulp!” (bu that was mostly at work). However, I think I’ve settled for now with putting some effort in learning Elasticsearch and Nest. So let’s start a series of posts on this venture. (No promises though, this “Part 1” post might end up being the only part…)

About this venture…

Elasticsearch can be seen as a NoSQL database, more specifically as a document database. I’ll be using my Bieb pet project (or at least its domain) for testing its features. Bieb’s domain should be familiar to everyone: books!

Specifically I’m writing this post because I quickly ran into a fundamental challenge with document databases, and I hope to gather my thoughts on the matter by writing about it. To set the stage, I’ll be talking about this part of the domain:

A Book has multiple Authors (of type Person).
A Person has authored mulitple Books.

The classic many-to-many relationship example. The challenge then obviously is how to express this in the various tools.

Setting the coding stage…

Before I dive into the Elasticsearch bit, let me first describe how I got this going in C#, SQL, and NHibernate. First, the SQL bit:

CREATE TABLE Book (

BookId INT IDENTITY(1,1) NOT NULL,

Title NVARCHAR(255) NOT NULL,

PRIMARY KEY CLUSTERED (BookId ASC)

)

CREATE TABLE Person (

PersonId INT IDENTITY(1,1) NOT NULL,

Name NVARCHAR(255) NOT NULL,

PRIMARY KEY CLUSTERED (PersonId ASC)

)

CREATE TABLE BookAuthor (

BookId INT NOT NULL,

PersonId INT NOT NULL,

PRIMARY KEY CLUSTERED (PersonId ASC, BookId ASC)

)

ALTER TABLE BookAuthor ADD CONSTRAINT FK_BookAuthor_Person FOREIGN KEY (PersonId) REFERENCES Person

ALTER TABLE BookAuthor ADD CONSTRAINT FK_BookAuthor_Book FOREIGN KEY (BookId) REFERENCES Book

Run-of-the-mill stuff, with a many-to-many table. Obviously, we don’t want to see BookAuthor show up as a class in our code. That is, we want this kind of C#:

public class Book

{

public virtual int Id { get; set; }

public virtual string Title { get; set; }

private readonly ISet authors = new HashSet();

public virtual IEnumerable Authors { get { return authors; } }

public virtual void AddAuthor(Person person)

{

authors.Add(person);

person.AddAuthoredBook(this);

}

public class Person

{

public virtual int Id { get; set; }

public virtual string Name { get; set; }

private readonly ISet authoredBooks = new HashSet();

public virtual IEnumerable AuthoredBooks { get { return authoredBooks; } }

protected internal virtual void AddAuthoredBook(Book book)

{

authoredBooks.Add(book);

}

As you can see, the Book is the “main” entity and the “boss” of the relationship with authors. Apart from the virtual keyword everywhere, little to no SQL or NHibernate know-how leaked into this domain.

The NHibernate mapping looks like this:

The important bit, and the link to Elasticsearch, is the inverse="true" bit. With NHibernate, you have to indicate which entity is “in charge” of updating the many-to-many table. With document databases you have to do the same thing.

Moving to Nest & Elasticsearch…

Diving head first into throwing these entities into Elasticsearch, I tried to run the following Nest integration test:

[Test] [Explicit]

public void Drop_and_recreate_filled_books_index()

{

NestClient.CreateIndex(i => i.Index("idx_nunit_001"));

var person = new Person { Id = 2001, Name = "Doe" };

var book = new Book { Id = 1001,Title = "The Greatest Book" };

book.AddAuthor(person);

NestClient.Index(book, i => i.Index("idx_nunit_001"));

}

It’ll fail, because Nest internally uses JSON.Net to serialize the Book, which means you’ll get this:

Newtonsoft.Json.JsonSerializationException : Self referencing loop detected with type ‘Bieb.Domain.Entities.Book’. Path ‘allAuthors[0].authoredBooks’.

A problem that was to be expected.

With SQL self-referencing loops are just fine, handled by the many-to-many table. In NHibernate it’s a minor nuisance, simply handled by the inverse="true" bit. However, with document databases this is somewhat less trivial, in my opinion. In fact, it may be the hardest part of using them: how do you design your documents?

What I’d like to do…

The solution obviously is to break the self-referencing loop somewhere. Simply not serializing a Person’s authored books should do the trick. (On a side note, it occurs to me that graph databases would not need this trick, and would be great for a scenario with endless Book-Person-Book-… traversal, but that’s for another time.)

However, I don’t want to hard-code the loop-break into my domain object, e.g. by marking AuthoredBooks as not-serializable. The first and foremost reason is that this would mean my storage layer leaks into my domain layer. The second reason is that I may want to break the loop at different places depending on the Elasticsearch index I’m targeting. That is, I’ll probably have an index for Authors as well as Books, and want to break the loop in different places on either occasion.

Questions…

So some questions remain. How could we do this with JSON.Net without altering the domain layer? Or is there a way to do this with Elasticsearch + Nest? Or should we bite the bullet and have a seperate set of DTO’s to/from our domain entities to represent how they’re persisted in Elasticsearch? Or do we need a different approach alltogether?

Good questions. Time to go and find out!

All roads lead to Excel, even those from SQL

2013-09-182014-05-11 Jeroen

My first employer provided me with some valuable insight:

Microsoft Excel is the main competition for any piece of software.

Over the years this statement has proven true an alarming number of times. And it makes sense too. Everyone knows how to use Excel, and it’s extremely flexible, especially when you’re working with any kind of (tabular) data. In other words: all roads lead to Excel.

Roads from SQL also often lead to Excel, even though they’re not always pretty. Sure, if you’re on foot or horseback, with a limited amount of luggage, the road will be fine. However, here’s a particular scenario that obscures the path.

Scenario

These are the basic constraints:

Available tools: MSSQL 2012, SSRS, SSIS, Visual Studio 2012 & .NET 4.5.
Excel versions: either XLS (2003 and below) or XLSX (2007 and up, slightly preferred) will do.
Form of data: combination of normalized and denormalized data (see below).
Amount of data: tops 250.000 rows (times 20 when unpivoted).
Required response time: live exports that should run within seconds.
Databases: many instances each with the exact same schema but different data.

So there’s access to the latest and greatest Microsoft tools, and the option to include custom components. Free and open source components are preferred, but buying tools and components is also an option.

Data

Here’s a simplified version of how the data is modeled:

Person is a “flat” table, containing some columns that have “fixed” personal details.
Property and Value allow for custom (normalized) Person fields.

Here’s a visual of this simplified model:

You can view the SqlFiddle with some sample data. A typical query to get the data that’s going to be our starting point:

SELECT p.Id,

p.FirstName,

p.Surname,

prop.Id,

prop.PropName,

val.Id,

val.ValName,

ppv.CustomValue

FROM Person p

LEFT JOIN PersonPropertyValue ppv ON ppv.PersonId = p.Id

LEFT JOIN Value val ON val.Id = ppv.ValueId

LEFT JOIN Property prop ON prop.Id = val.PropertyId

This will give output similar to this:

Id	FirstName	Surname	Id	PropName	Id	ValName	CustomValue
1	John	Doe	1	Trait	2	Bold	NULL
1	John	Doe	1	Trait	3	Easygoing	NULL
1	John	Doe	2	Eye color	4	Green	NULL
1	John	Doe	3	Pet name	7	Placeholder	Fluffybunny
2	Mark	Roe	1	Trait	3	Easygoing	NULL
3	Mary	Fisch	2	Eye color	6	Other…	Red-brown-ish
3	Mary	Fisch	3	Pet name	7	Placeholder	Ravi

Note that in reality I’m dealing with these figures:

About 30 columns on the Person table;
About 20 different Properties with about 6 possible Values on average;
Anywhere between 100 and 250.000 Persons;
Usually between 0 and 2 Values per Person per Property;

For one, this means that the normal output of mentioned query has a lot of redundant information (i.e. the 30-ish Person columns).

Target Output

The business requirement here when moving this data to Excel should be obvious: the data should be pivoted. Each “Property” should become a group of columns, with one column per “Value”. A table says more than a thousand words; this is the requested output:

			Trait		Eye Color		Pet name
Id	FirstName	Surname	Bold	Easygoing	Green	Other…	Placeholder
1	John	Doe	x	x	x		Fluffybunny
2	Mark	Roe		x
3	Mary	Fisch				Red-brown-ish	Ravi

Something along these lines is what the business user’s would like to see.

Bonus Objectives

Getting the target output in itself is a challenge. I’m not done yet though, here are some bonus objectives we have (with MoSCoW indications):

Properties and Values both have ordering, the order of columns Should respect that.
Any solution Should allow for some styling (fonts, borders, backgrounds. It’d be Nice to have further control, for example enable a theme, alternate row coloring, etc.
I Would like to have a place for metadata (date exported, etc) somewhere in the generated file.
Localization of the column headers (where applicable) would be Nice to have.
It’d be Nice to be able to reuse much of the solution in generating XML files instead of Excel sheets.
Any solution Must be solid and maintainable.
Any solution Must run on moderate hardware without hogging resources.

Current Solution

Right now, the above is accomplished using Reporting Services. This works decently well for datasets containing no more than a few thousand Person rows with related Property Values.

However, beyond about 3000 records performance quickly starts to degrade. This isn’t entirely unexpected, because Reporting Services isn’t really meant for this task (it’s much better at showing aggregates than at exporting large volumes of data).

Possible Solutions

There are many possible solutions. I’m currently considering a subset of them (some solutions merely for “benchmark” purposes):

SSIS packages. The tool seems meant for the job. I do hold a grudge against the tool, but maybe it’s time to get over that.
Dynamic SQL + generated RDLs. Use DynSQL to do the pivoting. This requires generated RDL files because the fields of a query must be known up fron to SSRS.
Dynamic SQL + OPENROWSET + OleDB. Use DynSQL to do the pivoting, and export it straight to Excel using OleDB.
FOR XML queries into OpenXML. The basic idea: fast FOR XML queries, possibly an XSLT, generating OpenXML data, and plug it in a basic XLSX.
ORM or ADO.NET into an OpenXML using an XmlWriter. Something along these lines.
BCP directly into Excel files. Haven’t looked into this yet, but it may be an option.
SQL CLR. Not sure how this would work (if at all), but there might be options here.

Now it’s time for me to try some of them, see which one fits the requirements best.

Pivot vs Unpivot

2012-12-262017-09-02 Jeroen

When mnemonics and learning by heart fail: write a blog post! So here is my own personal “which is which” reference post about the Pivot and Unpivot operations. First things first:

Using the great Scipio Africanus and Hannibal as puppets, the above example shows the two operations in their most basic form. The pivot operation would be along these lines in T-SQL:

DECLARE @MySource TABLE(General VARCHAR(100), SoldierType VARCHAR(100), Number INT)

INSERT INTO @MySource

(General, SoldierType, Number)

VALUES ('Scipio', 'Archers', 2000),

('Scipio', 'Footmen', 1500),

('Scipio', 'Footmen', 4500),

('Hannibal', 'Footmen', 5000),

('Scipio', 'Cavalry', 750),

('Hannibal', 'Elephants', 35),

('Hannibal', 'Archers', 1250)

/* Select unpivoted source data */

SELECT * FROM @MySource

/* Pivot the source data */

SELECT "General", "Footmen", "Archers", "Cavalry", "Elephants"

FROM (SELECT "General", "SoldierType", "Number" FROM @MySource) MySrc

PIVOT ( SUM("Number")

FOR "SoldierType" IN ("Footmen", "Archers", "Cavalry", "Elephants")

) AS PivotTable

The query above is also available as a SQL Fiddle, to toy around with it some more, as there are many more complex queries possible with these operations.

Enjoy!