Preamble
When making an application world-ready, it is important to isolate localizable resources but it is also important to identify those resources that are culture-invariant. That is to say, those resources that will be identical regardless of the culture the application is running in. Let me tell you a story to illustrate the point.
During 1997 I was developing a photo-real colour rendering application for paint retailers. However, when we tested the application on French or Spanish versions of Windows all the colours rendered to black. Investigation showed that the problem lay with our file data. We were storing colour reflectance data in text format files: suffice to say that these contain either 16 or 32 floating point values, e.g. (15.362,3.297,...). The problem was that we were using the instrinsic VB library functions to read this data. Now, let me be clear - the problem was not with VB, rather with our initial awareness of the functions. It transpired that VB has some locale-aware functions, and we were using these. So, VB was parsing the data using the current locale. The root cause was that French and Spanish use a comma as a decimal point… If you glance at our data formatting, it is immediately apparent what went wrong!
There were two potential resolutions considered:
- Change the separator from a comma to some other character that no culture uses as a decimal point, for example a tilde (~). This solution would have worked fine, because the VB libraries gracefully fell back to English decimal points when the locale-specific decimal point failed. However, we could not do that because we would have two file format implementations: English and non-English… not good.
- We wrote our own string parser. This meant significantly more dev effort and testing than should have been required but it meant that our files were valid anywhere.
I tell this story as a real-world example of the pain that can be encountered by not addressing globalization issues.
Formats
DateTime values are difficult to handle in a truly consistent, reliable manner. They have amongst the most variable representations of any value. I don't propose to enter into a deep discussion of these variabilities, but consider the following list of representations of the current LongDateTime of my writing this post:
- en-GB
- 27 September 2003 16:34
- en-US
- Saturday, 27 September, 2003 4:34 PM
- fr-FR
- samedi 27 septembre 2003 16 h 34
- fr-CA
- 27. septembre 2003 16:34
There are two factors to consider here:
- A DateTime value represented as above does not fully express all of the information we need. Why? Because in order to express the information fully we are missing TimeZone data. I am writing this post in England. The representation is entirely dependent upon the fact that I am in the BST time zone. If I display this data in New York, it will be inaccurate by 5 hours unless I factor in the time difference. Indeed, if the time were 01:34 then not only would the time be inaccurate but also the day part (indeed the month part and year part might also be inaccurate).
- The representation is culture-specific. In the examples above, you could deduce the culture if you wanted to. However, consider the following classic UK-US date problem: is 01/02/2003 the 1st February (UK) or the 2nd January (US)?
The way to resolve these issues is to represent the DateTime value in a culture-invariant manner. You can do this by always storing the value in a specified culture format, for example en-US. However, this is not standards compliant and requires that third party consumers of your data need to know the culture-specificity. The answer to this is to use a standard culture-invariant format. There are a number of these, for example NATO format (an example of which, I think, is 01 JAN 2003 16:34:00) but the best one to use is the ISO 8601 standard. Discussions of this standard can be found at the W3C, University of Cambridge, and University of Wellington. In this standard format, the date/time string would be 2003-09-27T16:34:00. In addition, this standard provides a mechanism for representing the timezone. The simplest way to make the data totally culture-invariant is to factor the data to GMT which is represented by a 'Z' suffix. So the date of this post would be represented as 2003-09-27T15:34:00Z.
So, this sounds like quite a lot of work....
But no! We are using .net, so our life is easy:
Here is a factoid about the 'Z' at the end of universal time representations. The Z stands for the "zero meridian", which goes through the Greenwich Royal Observatory in London, and it is also commonly used in radio communication where it is pronounced "Zulu" (the word for Z in the international radio alphabet). Universal Time (sometimes also called "Zulu Time") was called GMT before 1972, however this term should no longer be used. Since the introduction of an international atomic time scale, almost all existing civil time zones are now related to UTC, which is slightly different from the old and now unused GMT.
By the way, the reason why "Zulu" is used is that 'Z' is pronounced zee in the US whereas in the UK we pronounce it zed (oh! the joys of localization).
Resolving a SQL Server DateTime FormatException
Once again, the strength of test-first development asserts itself.
Earlier this month, I coded a test to assert that table data in a SQL Server is written/read correctly. I have a table with a "Created" column of type datetime. Here is a cutdown version of the 'read' stored procedure:
The test for this procedure simply asserts an NUnit [Test] against a DataSet:
I wrote the core code (omitted, as it isn't relevant to this post) and the test passed. All was well (so I thought) so I moved on to other code.
Then, today, this test failed with the following output:
Hmmm, I thought, what's up? I had not changed either the test or the core code, so why had tests started to fail?
I investigated the strings that were being parsed. Here is an example: '15/08/2004 13:05:52'.
Aha! The problem is immediately apparent... The first digit was being treated as a month. This implies that CultureInfo.InvariantCulture uses US date formats (and my dates are in UK format). Therefore until '12/08/2004' the string was being successfully parsed, but as soon as the day incremented beyond 12 it failed. The problem is, in essence, that I was not using truly culture-neutral date strings.
How to fix this? Well, the nasty solution is to require the database to be set to use UK date formats. So I immediately discounted that. I needed a solution that would work regardless of the database settings.
The answer is to have the database return UTC date format strings. Thankfully, SQL Server provides support to convert a datetime to an ISO 8601-compliant string. So, I changed the stored procedure to:
Now, the procedure returns a string such as: 2004-08-15T13:30:07.500.
However, the test code also needs amendment to properly handle the ISO 8601 date format:
These changes solved my problem and the test passed again. All is well.