I seem to have spent a large part of the last year ranting about people who don’t use database metadata properly. Usually I’m talking about things like primary keys, foreign keys and unique indexes, but recently I’ve come across a far more basic level of idiocy - using the wrong data types.
Currently part of my work involves taking data dumps from one database and loading it into another database (and then generating some reports from the data). Some of the fields involved contain dates. So being a sensible programmer I’m storing them in DATETIME columns in my database. It’s simply a case of working out what format the other database uses as its default export format for datetime values and converting that to my database’s default import format.
But, of course, it’s never that easy in real life. It turns out that the dates in my source database are stored in text fields. And (even worse) users edit those dates using unvalidated text inputs.
This gives me two problems. Firstly, there is the data that isn’t a date. It seems that the input forms are insisting on a value. So when people don’t know (or can’t be bother to find) the date, they are putting in values like “Unknown”, “n/a” or even just “–”. Or they might only have a vague idea of the date so they’ll put “July” or “Q3 2006″.
The second problem is the people who do put in valid dates - but in inconsistant formats. Sometimes it’s “2006-06-13″, sometimes it’s “13 Jun 2006″, sometimes it includes the time, sometimes it includes the day of the week.
Luckily, Perl’s Date::Parse module handles most of these formats. There’s one thing that it can’t deal with though. There is no way of knowing what date is represented by “12-06-2006″. As far as I’m concerned, it’s yesterday (12th June), but it’s an international organisation and the date could be in the illogical US middle-endian format. There’s just no way of knowing (well, short of tracking down the person who entered the date).
All this could have been avoided if the data had been stored in a DATETIME column in the source database. And the users should not be given free-form text fields to enter dates. These problems have turned what should have been a simple task into a major undertaking. And I’m not the only person taking data from this database. I’m sure that other people have had to jump through the same hoops as me.
Databases are powerful tools for describing and storing data. By storing everything in a string you’re limiting the power of the tool. You should always use the most appropriate storage data type for the data that you are storing.


What's also fun is databases which have a separate date and time column. In and of itself fine if they're not related. However if they are, you have to combine them for selection and sorting each time. I've had the pleasure once and now make it a point to ALWAYS create a datetime column even if the time isn't required (it can be set to 00:00:00 if unknown).
Sometimes, though, customers do need to enter vague date data such as the examples you gave. If you're dealing with the heritage sector, for example, you may not know exactly when a certain object was created; only that it was 'circa 1916', or 'June 1964', maybe even '1200-1450'. On other occassions, you may know the exact date.
As the 'calendar' date select pop-up thing is so popular these days (on any flight/hotel site, etc), I think it would be interesting for someone to take a shot at standardising (or at least creating) a usable date-entry system whereby the user could choose a specific date (or date time), OR a more vague time/timespan (such as those I mention above). Ideally what we then need is for databases to have an ISO 8601 style date field (that can order/sort correctly, etc.), where we can store more complex date/time data (e.g. periods), rather than specific values.
Dan,
You're right of course. I've done a lot of genealogical research into my family so I know the importance of vague dates. I forgot that when writing this entry.
In most cases tho', I think that's not necessary. In the few cases when it is, perhaps it should be modelled with two fields - an exact date field and a "date description field" for vague dates. I like your idea for a new date field, but realistically that's not going to be an option for some considerable time.
"There is no way of knowing what date is represented by "12-06-2006″."
I propose that there may be a way to guess. If you have a user associated to the row, it could be possible to examine the other rows entered by that user and then assume that the format is consistent with those dates when it is possible to determine the date. Hard to automate but not impossible.
I've got a better idea. Let's just cut the fingers off anyone who uses a middle-endian date.
The American form is logical when you understand that
whereas us Brits say '12th of June', most Americans
say 'June 12th'.
So we use '12-06', they use '06-12'.
John,
I understand completely why they do it. And I think it's perfectly acceptable to say either "12th January" or "January 12th". I think I use both of these about equally.
The problem is when you say "12/6" or "6/12". I know people in the US say that, but when you put it into an international computer system it just leads to confusion.
Either use the name of the month, or use the ISO date format of YYYY-MM-DD. Anything else is open to misinterpretation.
What about using Unix timestamps? They are universal, easily portable, and make date calculation/comparison really easy.
Unix (or any native style) timestamps have their own problems. For example, they are all in UTC, so not only do you lose any timezone information (eg, the timezone an event occured in) but you lose the ability to handle things like the floating timezone.
If you have to store a datetime in an untyped format (i.e. string) then you should always do it in ISO format. That's what it was invented for.
Dave,
I quite agree with the use of the ISO format.
It sorts correctly even if it is in a text column.
Unfortunately I always get to use databases that have
been designed by 'experts'.
The other problem with Unix timestamps is that if you want to use dates earlier than 1970 then you have to use negative numbers - and many systems don't handle that well.
My favroite is year month day, all run together - 20060623 because it sorts nicely, even as text.
Just a thought, but why not the "star date"? That way we will be really ahead of the curve! Not to be toungue in cheek, but really folks, use DATETIME. Until warp drive arrives, it is the most direct date object.