Dates clusterfuck: MSSQL vs Excel
Dates are usually the most error-prone element of various computer systems. There are so many different date formats, standards and representations. On top of this there are different calendars, too. Overall one could probably get a PhD on the topic without even scratching the surface of it.
Today I am going to focus on two tiny (but significant) aspects of the above, both related to differences between how Excel and Microsoft SQL Server store and process date related data.
Let’s start with the basics: in most cases a date in a computer system is represented as an integer value denoting a number of days between the date we need and some hard-coded date in the past.
In case of the SQL Server that hard-coded date in the past is 1-Jan-1900 (also known as „SQL Server Epoch”). So when you convert a date of 1-Jan-1900 into Integer, you get a nice, round zero:
Now, let’s see what the same date looks like from Excel perspective:
As we can see there’s a perfect situation for an off-by-one error: internally SQL Server stores the date of 1-Jan-1900 as 0 (zero) while Excel stores it as 1 (one). If you forget this subtle difference, you will be very surprised sooner or later. I learned this one in a quite painful way, first having to dig through hundreds of lines of VBA and SQL code and then to stare bluntly at the code for about an hour, with a herd of angry business users sitting on my back, waiting impatiently for some apparently crucial report. Eventually I got enlightened.
Why, for God’s sake, would the same company come up with two completely distinct integer representations of the date data type – I haven’t got a clue and I am not going to waste any more time on this one. So just remember: there’s one day difference between Excel and SQL Server when it comes to internal date representations.
Now let’s have a brief glance at another nice „feature”: calculating future or past points in time. Let’s say we need to add 10 days to today’s date:
In SQL Server:
So far, so good. Because internally a date is just a number of days, adding 10 to the current date works as expected in both systems.
How about months? What if we need to calculate a date 5 months from now?
Here’s when the fun begins…
In SQL Server:
This is brilliant! SQL Server has a nice DATEADD operator that takes any reasonable time interval as the first parameter (actually we could have used this function in the previous example, to add 10 days), then the number of intervals to be added, and finally a date to add the intervals to. Clean and simple. But for some equivocal, mysterious, mad reasons Microsoft decided to create a separate EDATE function in Excel just to add / subtract months to / from a date. Why, o my? Why?
Things get even more charming when it comes to years. As expected, if we want to get a date, let’s say, 25 years from now, we will use the same DATEADD operator in SQL Server:
But not in Excel. Here we need to juggle a good couple of eggs in order to achieve the same:
So basically we’re extracting the year, then adding 25 to it, then extracting the month, then the day, then finally assembling them three back together into a date type. And we need to deal with leap years (they have impact on the above formula; not a huge one but nevertheless).
To sum up today’s article: I love Excel for many, many things. Flexibility. Script-ability. Virtually infinite number of possibilities to get things done. But I loathe it for the way it deals with dates.
Shame on you, Excel!