Imagine you’re programming an application and you run across the following (pseudocode) method:
double foo (int num) {
return someNum/num;
}
Any programmer who’s been programming for more than, oh, 3 days, is going to ask “what happens when you pass a zero?” What’s remarkable about this is not just that the original programmer forgot to consider this case, but that many times the original programmer will have all sorts of fascinating arguments about why they deliberately ignored this case. Why is it that many serious programmers would be aghast at these arguments but somehow accept them when it comes to programming a database?
Before considering the sad state of database programming, let’s take a closer look at the arguments for the above function’s skipping argument validation. If the programmer who wrote that function defends his or her decision to not test for zero, the argument is generally along the lines of “I wrote the entire system and I know it’s not possible for a zero to get passed”. This is sometimes followed up with “this function is called a lot and I don’t want the overhead of the extra conditional.”
Years ago, I was writing some COBOL for a system I had just started working on. When I was done, I handed it off to the lead programmer and she deleted a large chunk of my program before returning it to me. What did she delete? Virtually all of my data validation. Her reasoning was that no data could enter the system except through some very carefully designed “control” programs which would not allow any bad data through. As a result, I was testing for things which could never happen. This idea, which seemed reasonable to me at the time, has some serious potential problems.
First, it assumed that whoever was writing software for the system knew about the control programs and would ensure that they were always up to date. This was a large system, she was transferred to a different department and, as a result, I didn’t always know which control program was responsible for validating which data. The second problem is even more insidious. It means that whenever a maintenance programmer looks at a given function, they have to hope that the data validation is in place for it. Further, they have to hope that their understanding of the data validation matches what’s in the control programs. If they’re unsure, they need to wade through the code and hope they find all of the places where the data validation occurs to verify that it’s correct. For large systems, and particularly for programmers who are less familiar with the system, this can be a huge waste of programmer time and is fraught with errors.
The second argument, the one which cites performance concerns, can be a tough one to argue about. Personally, I’d much rather have a slower system which gives correct answers than a faster system which doesn’t. More than once I’ve asked a programmer if they’ve profiled their code to find out if the validation is really a problem only to be greeted with a blank stare. Make sure your software is correct and maintainable first. Then, and only if you have a known performance problem, should you start profiling your code and benchmarking alternate solutions.
What this all boils down to is a simple rule of thumb: keep your data validation close to your data. It’s easier to maintain and is more likely to be correct. If you start duplicating the validation, abstract it out into shared code which is easy to find and reuse.
This brings us back to the aforementioned link where the author concludes:
I’m sure there are a number of arguments that can be made for using foreign keys etc etc, but it’s just stuff that gets in the way in a framework like rails. Do we test w/ foreign keys in place before milestones(what I’m leaning towards)? Do we test w/o the fks in place(obviously not :])? Maybe somebody can come up with a better way to handle them, but I don’t really think you need fks if you’re using rails associations correctly. If you are forced to use them though, the [approach I’ve outlined] might be a good place to start.
While the author is specifically referring to foreign keys not being needed for Rails, the problem is more widespread. I’ve listened to people tell me that they prefer to handle data integrity code at the app level rather than the database level. Certainly there are times where this is more appropriate, but not for something as basic as foreign keys. Concluding that foreign keys are not necessary is very dangerous and short-sighted.
First, keep in mind that a database like MySQL or PostgreSQL is effectively a programming platform with a declarative language (SQL) sitting on top of it. If you have an SQL statement inserting data, you want the receiving table to have the data validation in the form of foreign keys, custom data types, enums, triggers and so on. If you’re writing code for personal use or you control an open-source project, skip this if you want. It’s your code; it’s your choice. However, if you’re writing code for a business, you don’t know who will be maintaining it tomorrow. Maybe they don’t know about your clever data validation routines which allow you to forego foreign keys. Maybe they’re finding your code too slow so they decide to write directly to the database themselves. Whatever the reason, by putting the data validation as close as possible to the data munging, you make it harder to circumvent.


If the original code was Java code, the JVM will throw an exception anyway if num==0. So what would be the point of adding explicit validation?
It's only data; you'd think accuracy were important or something.
It's a technical issue that causes productivity problems.
The whole problem is that the database isn't really integrated in the application framework. I.e. the checks you code, should also be done in the database, the browser and the appserver. If you use PHP/PEAR's HTML_QuickForm classes, you get a check in the browser and the appserver with just one line. And that's the most I've seen until now.
Frank, if you're dealing with Java, yes it will throw an exception and the default exception is quite likely to be fine for your needs. I still feel the explicit validation is good as I tend to adhere to the standard of "a good programmer looks both ways before crossing a one-way street".
However, even granting the Java example, what about C++? Division by zero doesn't throw an exception (though I believe it does if you're only using integers). What's worse, a tiny C program I wrote to test this returned "0.000000". That's not good.
It's not just foreign keys. There seems to be a whole generation of "database programmers" out there who don't use any kind of metadata. I put this down to two things:
1/ People working as database designers without ever having been taught any database theory. Or even reading a book on the subject.
2/ Popular database systems (and, yes, I'm looking at you MySQL 3.x) which didn't have a reasonable level of support for metadata.
Data and it's relationships should be described as much as possible in the database. That's what databases are for. They shouldn't be seen as just some static dump for data.
If you're dropping foreign keys, then what's next? I know, we'll let the application keep track of all of the data means. Let's make each database a single table. It'll have a thousand columns and all of them will be varchar(255).
I agree with the author and the pro-FK comments. Especially by Dave Cross. I think most new "database programmers" see database as an special-featured Excel.
Always put theory (and idea) first!
While in theory I agree, in practice this is much more problematic. If I (as a DBA) have the ability to capture the scope of a requirement and design, build, and implement that scope, my schema may contain FK's, defaults, non-nullable fields, et cetera.
Unfortunately, usually I inherit someone else's mess, designed by someone who did not understand what they were doing, or did not envision the data being used to current requirements. Ultimately, at management's impatient insistence I end up ripping out most of the constraints just to get the durn thing to work in our environment, and once it's working, nobody wants to touch it for fear of introducing more complications.
From an IT perspective, data integrity is a very important thing; however, from a business perspective, it would seem that any data (even inaccurate data) is better than no data. Go figure.
@Jeremy: I do understand your pain. Unfortunately, developers who know a little SQL seem to assume that's all they know about databases. May I recommend picking up the book Refactoring Databases: Evolutionary Database Design as an alternative strategy?
Of course, business types are often so focused on short-term results that they don't care about long-term success. After all, they know that they (and the app) probably won't be here in a few year's time.