I seem to have spent a large part of the last year ranting about people who don’t use database metadata properly. Usually I’m talking about things like primary keys, foreign keys and unique indexes, but recently I’ve come across a far more basic level of idiocy - using the wrong data types.

Currently part of my work involves taking data dumps from one database and loading it into another database (and then generating some reports from the data). Some of the fields involved contain dates. So being a sensible programmer I’m storing them in DATETIME columns in my database. It’s simply a case of working out what format the other database uses as its default export format for datetime values and converting that to my database’s default import format.

But, of course, it’s never that easy in real life. It turns out that the dates in my source database are stored in text fields. And (even worse) users edit those dates using unvalidated text inputs.

This gives me two problems. Firstly, there is the data that isn’t a date. It seems that the input forms are insisting on a value. So when people don’t know (or can’t be bother to find) the date, they are putting in values like “Unknown”, “n/a” or even just “–”. Or they might only have a vague idea of the date so they’ll put “July” or “Q3 2006″.

The second problem is the people who do put in valid dates - but in inconsistant formats. Sometimes it’s “2006-06-13″, sometimes it’s “13 Jun 2006″, sometimes it includes the time, sometimes it includes the day of the week.

Luckily, Perl’s Date::Parse module handles most of these formats. There’s one thing that it can’t deal with though. There is no way of knowing what date is represented by “12-06-2006″. As far as I’m concerned, it’s yesterday (12th June), but it’s an international organisation and the date could be in the illogical US middle-endian format. There’s just no way of knowing (well, short of tracking down the person who entered the date).

All this could have been avoided if the data had been stored in a DATETIME column in the source database. And the users should not be given free-form text fields to enter dates. These problems have turned what should have been a simple task into a major undertaking. And I’m not the only person taking data from this database. I’m sure that other people have had to jump through the same hoops as me.

Databases are powerful tools for describing and storing data. By storing everything in a string you’re limiting the power of the tool. You should always use the most appropriate storage data type for the data that you are storing.