All roads lead to Excel, even those from SQL

My first employer provided me with some valuable insight:

Microsoft Excel is the main competition for any piece of software.

Over the years this statement has proven true an alarming number of times. And it makes sense too. Everyone knows how to use Excel, and it’s extremely flexible, especially when you’re working with any kind of (tabular) data. In other words: all roads lead to Excel.

Roads from SQL also often lead to Excel, even though they’re not always pretty. Sure, if you’re on foot or horseback, with a limited amount of luggage, the road will be fine. However, here’s a particular scenario that obscures the path.

Scenario

These are the basic constraints:

  • Available tools: MSSQL 2012, SSRS, SSIS, Visual Studio 2012 & .NET 4.5.
  • Excel versions: either XLS (2003 and below) or XLSX (2007 and up, slightly preferred) will do.
  • Form of data: combination of normalized and denormalized data (see below).
  • Amount of data: tops 250.000 rows (times 20 when unpivoted).
  • Required response time: live exports that should run within seconds.
  • Databases: many instances each with the exact same schema but different data.

So there’s access to the latest and greatest Microsoft tools, and the option to include custom components. Free and open source components are preferred, but buying tools and components is also an option.

Data

Here’s a simplified version of how the data is modeled:

  • Person is a “flat” table, containing some columns that have “fixed” personal details.
  • Property and Value allow for custom (normalized) Person fields.

Here’s a visual of this simplified model:

Database model for Person, Property, and Value

 

You can view the SqlFiddle with some sample data. A typical query to get the data that’s going to be our starting point:

This will give output similar to this:

Id FirstName Surname Id PropName Id ValName CustomValue
1 John Doe 1 Trait 2 Bold NULL
1 John Doe 1 Trait 3 Easygoing NULL
1 John Doe 2 Eye color 4 Green NULL
1 John Doe 3 Pet name 7 Placeholder Fluffybunny
2 Mark Roe 1 Trait 3 Easygoing NULL
3 Mary Fisch 2 Eye color 6 Other… Red-brown-ish
3 Mary Fisch 3 Pet name 7 Placeholder Ravi

Note that in reality I’m dealing with these figures:

  • About 30 columns on the Person table;
  • About 20 different Properties with about 6 possible Values on average;
  • Anywhere between 100 and 250.000 Persons;
  • Usually between 0 and 2 Values per Person per Property;

For one, this means that the normal output of mentioned query has a lot of redundant information (i.e. the 30-ish Person columns).

Target Output

The business requirement here when moving this data to Excel should be obvious: the data should be pivoted. Each “Property” should become a group of columns, with one column per “Value”. A table says more than a thousand words; this is the requested output:

Trait Eye Color Pet name
Id FirstName Surname Bold Easygoing Green Other… Placeholder
1 John Doe x x x Fluffybunny
2 Mark Roe x
3 Mary Fisch Red-brown-ish Ravi

Something along these lines is what the business user’s would like to see.

Bonus Objectives

Getting the target output in itself is a challenge. I’m not done yet though, here are some bonus objectives we have (with MoSCoW indications):

  • Properties and Values both have ordering, the order of columns Should respect that.
  • Any solution Should allow for some styling (fonts, borders, backgrounds. It’d be Nice to have further control, for example enable a theme, alternate row coloring, etc.
  • I Would like to have a place for metadata (date exported, etc) somewhere in the generated file.
  • Localization of the column headers (where applicable) would be Nice to have.
  • It’d be Nice to be able to reuse much of the solution in generating XML files instead of Excel sheets.
  • Any solution Must be solid and maintainable.
  • Any solution Must run on moderate hardware without hogging resources.

Current Solution

Right now, the above is accomplished using Reporting Services. This works decently well for datasets containing no more than a few thousand Person rows with related Property Values.

However, beyond about 3000 records performance quickly starts to degrade. This isn’t entirely unexpected, because Reporting Services isn’t really meant for this task (it’s much better at showing aggregates than at exporting large volumes of data).

Possible Solutions

There are many possible solutions. I’m currently considering a subset of them (some solutions merely for “benchmark” purposes):

  • SSIS packages. The tool seems meant for the job. I do hold a grudge against the tool, but maybe it’s time to get over that.
  • Dynamic SQL + generated RDLs. Use DynSQL to do the pivoting. This requires generated RDL files because the fields of a query must be known up fron to SSRS.
  • Dynamic SQL + OPENROWSET + OleDB. Use DynSQL to do the pivoting, and export it straight to Excel using OleDB.
  • FOR XML queries into OpenXML. The basic idea: fast FOR XML queries, possibly an XSLT, generating OpenXML data, and plug it in a basic XLSX.
  • ORM or ADO.NET into an OpenXML using an XmlWriter. Something along these lines.
  • BCP directly into Excel files. Haven’t looked into this yet, but it may be an option.
  • SQL CLR. Not sure how this would work (if at all), but there might be options here.

Now it’s time for me to try some of them, see which one fits the requirements best.

BattleTop

Exactly one month ago I wrote about Google Code hosting. At the time I wasn’t ready to divulge the project I was using to test it, but today’s different. Today, I have decided to put BattleTop in Beta!

BattleTop is a responsive single-page web application to assist in bookkeeping things like Characters, Initiative, Hit Points, Conditions, et cetera, during D20-based table top RPG sessions.

BattleTop Beta Logo
BattleTop Beta Logo

You can view it, clone it, and provide pull requests for it at Google Code. Mind you, it is still in beta, meaning there’s many rough edges and bugs to be found. I have been and will be dogfooding it during our own table top RPG sessions, so I’ll be sharing your frustrations about bugs and time permitting I will be fixing things.

The current list of features:

  • Track characters. Add and remove monsters, NPCs, PCs, and environment initiatives. You can also reset the list to the party, or to a blank, new list.
  • Track initiative. Keep initiative and initiative modifiers, sort the list by initiative, keep track of ready and delay actions.
  • Track conditions. Each character has its own list of conditions which you can add, remove or change, with a number of turns to it (so they wear off automatically).
  • Track hit points. Each character optionally has a number of hit points. You can deal damage or apply healing to change the hit point amount.
  • Save/Load. Using the LocalStorage API BattleTop will save your state every 5 seconds. If you navigate away or re-open the page on your next play session the old state will be there.

Note: because BattleTop extensively uses many modern features (html5 semantic markup, css3 features, modern JS such as LocalStorage), only modern browsers will be supported.

Here’s a general view of what it currently looks like:

 

BattleTop 0.9.0 Setup Mode
BattleTop 0.9.0 Setup Mode

 

BattleTop 0.9.0 Update Hit Points
BattleTop 0.9.0 Update Hit Points

The horrifying state of free Android initiative-keeping apps was what triggered me to create BattleTop. I’ve decided on a HTML5 app as opposed to a native app because (a) it would be easier for me to create something in a short amount of time, and (b) to keep it portable across devices and operating systems with little effort. Hopefully BattleTop will help or inspire others as well.

Pivot vs Unpivot

When mnemonics and learning by heart fail: write a blog post! So here is my own personal “which is which” reference post about the Pivot and Unpivot operations. First things first:

 

Example of the pivot and unpivot operation

 

Using the great Scipio Africanus and Hannibal as puppets, the above example shows the two operations in their most basic form. The pivot operation would be along these lines in T-SQL:

The query above is also available as a SQL Fiddle, to toy around with it some more, as there are many more complex queries possible with these operations.

Enjoy!

Sassy Styles

In my previous post I ranted about the way the design community seems to violate the DRY principle. Let’s revisit the code (and Repeat the code, I know, I know):

What’s going on here? Well, to get to our beloved Em measurements, we apparently need a calculation based on our body’s font-size (24px) and the h1 target font-size (16px). This “would make future adjustments much, much easier”.

It’s not that I dislike this really, but more that I despise (having to do) this. We’re saying “1.5em” here, only we’re doing it twice.

Last week I’ve tried one of the solutions to this: SASS. And let me say, this feels like it could be love at first sight! With SASS, the above snippet will quickly transform into the following:

Much better, no? We’ve now only stated once what the font-size should be: a certain fraction × 1em. I’m a little bit disappointed about needing the “* 1em” there, but hey: it’s a great reason to ask another Stack Overflow question.

Anyways, SASS doesn’t stop here. It will add more improvements, one of particular importance to the above snippet. Consider this:

What’s up with the additional lines of code? Isn’t that extra code bloat? Well no, those lines help us achieve two very important goals:

  1. Our calculation is now much more meaningful, and will now truly “make future adjustments much, much easier”.
  2. We can reuse those variables in our style sheets. In the somewhat contrived example above it doesn’t really shine, but you can surely imagine this is a great benefit to the entire style sheet.

For my current pet project I’ve tried SASS in a feature branch, but I’ve already closed that branch: it was merged into the main branch after only a few hours. With this I’m indeed implying there’s a very friendly learning curve for SASS.

And yes: I’m also implying that you should try it for yourself! There’s many more nice features I haven’t even mentioned yet. And if I didn’t convince you, perhaps the two-page tutorial will!

Measurements in Responsive Design

My wife called me out for looking at CSS through a pair of Programmer’s Glasses™. She hastily added this could well be a Good Thing, and I suppose I’ll just interpret it as a compliment. In fact, I must say I agree, feeling more like a “developer” than a “designer”.

Let’s first look at how we ended up at this name-calling. Here’s a code snippet from Ethan Marcotte‘s book Responsive Web Design from the A Book Apart series:

He then goes on to state that:

I usually put the math behind my measurements in a comment to the right-hand side of the line, which makes future adjustments much, much easier for me to make.

At first glance, this makes sense. However, in the long run, this feels really weird to me: it smells like code duplication. The actual result and the calculation in the comment both express the “what“, just in different form. It reminded me of Clean Code, where a whole chapter is dedicated to comments, and I’d think this would fall under the Bad Comments section (“Noisy Comment”, perhaps?).

Now I’m currently thinking I must be wrong: everyone who is something in responsive design is generating this type of sample code. Maybe it is because I should read this kind of code like:

Both forms describe what the font-size should be. Form 2 is probably “best” from a Clean Code point of view (as it’s most descriptive), but unfortunately only form 1 is valid (plain) CSS. As a compromise both forms are kept.

So, what are the options for improving things? There’s at least a few I can currently see:

  1. Combine “Form 1 and 2”. Accept that you’ll need discipline to keep the measurement and comment in synch. This is what the Responsive Design community leaders seem to practice.
  2. Just use “Form 1”. You’ll loose (or never have) the benefit of understanding your measurement.
  3. Use a CSS pre-processor. There are Sass and LESS, the most well-known ways to introduce (among others) calculations in stylesheets.
  4. CSS3 modules. The CSS3 Calculations module introduces calculations, and the Variables module may even take this one step further (as far as preventing code duplication is concerned).

To be honest, this list is currently my reverse order of preference. I’d love for option 4 (the CSS3 modules) to become a success. Until then, I’m bound to investigate the CSS pre-processors, because option 1 and 2 are both crappy, in my opinion.

CodePlex Hg Hosting

This is a follow-up to my previous post on Hg Hosting Providers. I found that on the surface all four of them were very similar. From the four providers I investigated, CodePlex overall felt most intriguing, so I decided to start by trying out that one.

Note: I participate in another project, which I will be uploading to Bitbucket (because that’s the only one that supports small private projects at no charge), so I may be doing a follow-up on that provider too.

Sign Up and Project Creation

The CodePlex home page lures you into creating a project with a big, purple, Metro-style button:

CodePlex Step 1 - Create Project Button
Step 1 – Create Project Button

When you hit the button you’ll be asked to either sign in or register:

CodePlex Step 2 - Sign in
Step 2 – Sign in

If you don’t have an account yet (like I did) you can register for one on the spot. I decided to create an account linked to my Windows Live ID:

CodePlex Step 3 - Register
Step 3 – Register

The hardest part is the Captcha: damn those things can be a challenge! After finally completing this mini-game, it turned out CodePlex had forgotton all about my intention to “Create a Project”, and instead presented me with my personal (and very empty) project home page:

CodePlex Step 4 - Account page
Step 4 – Account page

Luckily, there’s another (though less obvious) “Create a Project” link on the page. There’s only a small number of fields you have to complete:

CodePlex Step 5 - Create Project
Step 5 – Create Project

The whole point for me was to choose Mercurial hosting, but it’s noteworthy that both TFS (which also supports SVN clients) and Git are an option too. Hit “Create” and if all goes well you’ll be directed to the brand new project’s home page:

CodePlex Step 6 - New Project Overview
Step 6 – New Project Overview

The project has now entered the “Setup Period”. This gives you 30 days to set up the project, i.e. before you have to go live. This was in fact a welcome surprise to me, because since this was my first time coordinating a hosted project the setup period allows me to calmly check out all the features.

Importing the Existing Repository

Keeping in mind that eventually I would be hosting the code online, I had already started with a Hg repository, committing my changes locally. There are several advantages to using version control while still flying solo, amongst others having backups and traceability.

I already suspected it wouldn’t be too hard to merge an existing repository into a new hosted project, and I even got some response to my question on Twitter, indicating it should be possible. The only thing I was worried about is that the username for commits would be “wrong”, because my computer user name “Jeroen” is different from my new CodePlex user name “jeroenheijmans”. To get everything to work I took the following steps:

  1. A simple copy/paste back-up of the existing repository, just to be safe.
  2. With some effort I got this Hg convert trick to work, changing the user name in commits for the existing repository to my CodePlex username.
  3. Make a clone (a.k.a. check out) of the project repository from CodePlex to a new local folder.
  4. Copy the existing repository over the clone from CodePlex.
  5. Some of the hidden files in the .hg folder have to be overwritten. I double checked all files with Notepad++, and in all (my!) cases the clone’s file could be safely overwritten.
  6. Using TortoiseHg I now pushed all the changes. In the previous step the CodePlex URL was probably lost, but that was easily remedied by entering it once more.

To be honest, in hindsight I don’t know if step 4 and 5 are really necessary. I guess you could safely try just pushing your existing repo to CodePlex: if it doesn’t work surely you’ll get an explanatory error message?

Either way, after completing the above steps my CodePlex project contained a complete history of 40+ commits, only minutes after setting up the project!

CodePlex Features Overview

After importing the source code it was time to investigate the various CodePlex features. Given the 30 day setup period I was able to check out all the features without having to worry others may see my silly mistakes. Here’s my first impression of the available features:

  • Home is the first thing you see if you go to the project URL (which is a nice subdomain of CodePlex: http://yourproject.codeplex.com). It mainly contains some text to explain what the project is all about, and you’re invited to edit it (with Wiki-style markup). Worked nice and easy for my simplistic purposes.
  • Downloads is the section where you can offer releases, which can have various properties as well as attachments such as binaries or installers. So far I’ve only created two “Planned Releases”, which worked pretty straightforward.
  • Documentation is Wiki-based. So far I’m headed straight for the quintessential documentation pitfall: I’ll create it once the product’s finished.
  • Discussions looks like a forum-based commenting area. Given that the project is one we do together with friends we may not get to use this option much.
  • Issue Tracker is a very basic work-item system. It contains the bare essentials, but not much more. This was a bit disappointing to be honest, I would at least have liked to have some basic formatting, but apparently that’s still a work item on CodePlex itself.
  • Source Code is the section where… well… you can find the source code! It shows all commits in descending order, and even though I haven’t tried it yet it also shows forks and pull requests. If you click on a commit you get to see the diffs for changed files.
  • People shows all the coordinators, developers and editors on the project. It’s also the place to invite your team mates to the party!
  • License is where you can view and change the license for your project. CodePlex allows you to choose from ten different OSI licenses. I’m leaning towards picking the BSD license, but this license stuff feels like a snake pit (with it’s own acronym: IANAL).

So far this list of features seems pretty sufficient. At least: I haven’t “missed” a feature so far.

Conclusions

All together I’m pleasantly surprised by how easy it was to set all this up. I’m not sure (yet) how it actually compares to the other hosting providers I considered, but so far I’m happy I chose CodePlex.

I’m fairly certain our project will be published some time soon, and when it does you can find it at bieb.codeplex.com: join the party!

Comparing Hg Code Hosting Providers

Mercury ElementMost usually, I’m a very organized person. Even before I was programming (when I was making Hero Quest maps in Q&A and WordPerfect) I would still want periodic backups of my data. Given all that, I’ve remained remarkably oblivious of Version Control for quite a long time. Here’s an overview of my VCS history:

  • 1990 – 1992 Derp derp, playing Commander Keen and friends.
  • 1992 – 2003 Copy-paste-style backups, periodically.
  • 2003 – 2005 Visual Sourcesafe. The horror…
  • 2005 – current SVN and TortoiseSVN. Much better!
  • 2011 – current Mercurial (with TortoiseHg), and a dash of Git.

On a coding project with friends, one of them suggested we’d use one of the DCVS systems: either Hg or Git. After a short debate (and after reading endless flamewars between the two) we decided on trying Hg. My “re-education” started with reading the HgInit tutorial by Joel Spolsky. Here’s a quote that stuck with me:

It turns out that if you’ve been using Subversion, your brain is a little bit, um, how can I say this politely? You’re brain damaged. No, that’s not polite. You need a little re-education.

True as it is.

To be honest though: I still don’t mind using SVN. At work we have an existing repository from which we’ll probably not be switching, and it even has some advantages over Hg. However, for any new project I would choose a DCVS, either Git or Hg, depending on the circumstances.

So for my latest pet project I’ve started a Hg repository as well. So far I’ve just been committing locally, using it as a backup and history mechanism. However, with friends and family joining on the project I will probably be moving the repository to an Open Source Hosting Provider. There’s a few that came to mind, and I’m currently considering four of them. Even though there’s a decent Wikipedia comparison article, I still decided to make my own comparison table with features I find interesting:

Google Code Bitbucket CodePlex SourceForge
Source Control Hg, Git, SVN Hg, Git Hg, Git, SVN, TFS Hg, Git, SVN
Issue tracker Custom JIRA Custom Custom
Wiki MoinMoin-based markdown Creole-based markdown Yes Daring Fireball-based markdown
Forum No No Yes Yes
Private projects No Yes No No
Licenses Any, single-licensed ? 10 OS licenses available Any OS license
Authentication Google Account Bitbucket account Codeplex + optional Win. Live SourceForge account
Projects 250,000+ 93,000+ 28,000+ 350,000+

Currently I’m leaning somewhat towards CodePlex, for no particular reason or rationale. Or perhaps I’ll end up trying them all before deciding. Either way, I’ll sleep another night on it, for now.

Can haz layout

Contains IE Hack This weekend I decided to deep dive in a StackOverflow question on the combination of pseudo-elements and the IE8 (and lower) filters. Even though it seemed to me the question couldn’t be answered with 100% certainty, I still gave it a shot. A similar but better answer got accepted (damn you, “ScottS”!!), but that’s besides the point. The most important thing I learned from answering this question, is understanding the magic in IE (especially version 8 and below). So, for reference, here are the important links of articles that made me understand IE a bit better:

The kicker from all this research is that I now finally understand this remotely related line of CSS I’ve seen (and even used) a lot:

This is a CSS hack to get some versions of IE to behave nicely. So far, nothing new. However, after reading the above articles, I finally understand what it does: it  forces the hidden IE property hasLayout to true. And that’s useful for quite a few cases where you’d expect things to have layout, but when they don’t by default.

Hack or not, I’m glad I understand things a wee bit more now.

CSS syntax naming conventions

My next Stack Exchange Challenge post will most likely be about the Programmers SE. One part of the challenge is to actually ask a question I have on the topic. The question I came up with (to be honest, this has been bothering me for months now) took quite some time to write down carefully. So, as I don’t have my next SE Challenge post ready yet, I decided to cross-post my question here on my blog as well.

The Question: what are the practical considerations for the syntax in class and id values?

Note that I’m not asking about the semantics, i.e. the actual words that are being used, as for example described in this blogpost. There are a lot of resources on that side of naming conventions already, in fact obscuring my search for practical information on the various syntactical bits: casing, use of interpunction (specifically the - dash), specific characters to use or avoid, etc.

To sum up the reasons I’m asking this question:

  • The naming restrictions on id and class don’t naturally lead to any conventions
  • The abundance of resources on the semantic side of naming conventions obscure searches on the syntactic considerations
  • I couldn’t find any authorative source on this
  • There wasn’t any question on SE Programmers yet on this topic

Some of the conventions I’ve considered using:

CSS naming conventions
CSS naming conventions
  1. UpperCamelCase, mainly as a cross-over habit from server side coding
  2. lowerCamelCase, for consistency with JavaScript naming conventions
  3. css-style-classes, which is consistent with naming of css properties (but can be annoying when Ctrl+Shift+ArrowKey selection of text)
  4. with_under_scores, which I personally haven’t seen used much
  5. alllowercase, simple to remember but can be hard to read for longer names
  6. UPPERCASEFTW, as a great way to annoy your fellow programmers (perhaps combined with option 4 for readability)

And probably I’ve left out some important options or combinations as well. So: what considerations are there for naming conventions, and to which convention do they lead?