An Interview with Chris Dateby Tony Williams
Editor's note: Tony Williams conducted this interview with Chris Date shortly after the release of his new book, Database in Depth: Relational Theory for Practitioners, from O'Reilly. In this extensive conversation, Chris debunks a lot of wrong information on the "weaknesses of the relational model," discusses the impact of his classic book, The Third Manifesto, with Hugh Darwen; evaluates the future of SQL as well as his past comments on the language; and closes with his thoughts on the future of DBMSs (Database Management Systems). And if you find you just want more, you can find further discussions related to many of the questions contained in this interview in his book, Database in Depth.
Tony Williams: How did you get started with relational theory?
Chris Date: As with so many things, this was basically just luck--a matter of being in the right place at the right time. I was working at IBM in England, where I had been hired as a programming instructor. I had been doing that job for a while, but IBM had a very enlightened policy according to which you couldn't spend all your time just teaching--from time to time you had to rotate out and get down into the trenches, as it were. So I rotated out and joined a little research group, where I was given the job of figuring out what the PL/I language should do to support this new thing called database management (this was early 1970). So I played with IBM's database product IMS--that was IBM's major product offering at the time--and I studied the CODASYL DBTG database specifications; IMS was hierarchies and CODASYL was networks. Then Ted Codd published his famous paper "A Relational Model of Data for Large Shared Data Banks" (Communications of the ACM, Vol. 13, No. 6, June 1970). So I read that paper, and--speaking here as a mathematician!--to me it was obvious that the relational model was the right way to go. Looking back, if I'd realized how long it was going to take to get the world at large to agree with that position, I don't know if I would have been quite so enthusiastic ... Anyway, I began corresponding with Ted at that time, I met him some little while later, and one thing led to another.
Tony: In the introduction to your new book you write that some things needed to be said again. How would you summarize those?
Chris: Goodness! Where to begin? There's so much nonsense out there ... so little true understanding, that it seems to me that just about everything needs to be said again. Perhaps I can illustrate by quoting a typical document off the Web. The document I have in mind is called "Weaknesses of the Relational Model," and it comes from an academic institution in Germany (so the lack of understanding I'm talking about certainly isn't limited to the USA, nor is it limited to the commercial world). Here are the alleged "weaknesses," quoted verbatim except that I've numbered them for purposes of subsequent reference:
- With regard to data modeling, you can't define attributes which are of complex types (arrays, records, tables). Each relation has to be in first normal form. Or in other words: A "simple" natural structure must be divided into many flat structures (= tables/relations). The result is a complex structure of relations.
- Result of a query is a flat table. Any complex structure, which has been input of the query has got lost.
- No way to define recursive program structure (using SQL).
- The type system of SQL doesn't match with the type system of the embedding language ("type mismatch").
- Controlling integrity constraints costs a lot of time (need to control the usage of primary/foreign keys).
- Lack of mechanisms to control the physical level of the database (only simple clustering).
- Definition of operations detached from data definition.
"There's so much nonsense out there ... so little true understanding, that it seems to me that just about everything needs to be said again."
Considered as "weaknesses of the relational model," every single one of these is just plain wrong. Number 1 displays a lack of understanding of first normal form; ditto number 2 (and by the way, that phrase "flat table" all by itself demonstrates a huge failure to understand what relations truly are; I could write several pages on this issue alone). Number 3 is not only false as a statement of fact, it also makes the mistake of equating the relational model and SQL--as does number 4. Number 5 is 180 degrees wrong; I mean, not "controlling integrity constraints" is what costs us, and I don't mean costs only with respect to time--there are other costs too, ones that I regard as much worse. Number 6 ignores the fact that the relational model deliberately has nothing to say about "the physical level of the database"; one objective for the relational model was always to give implementers a high degree of freedom at the physical level, in order to make possible a high degree of data independence (I discuss this issue at some length in my O'Reilly book, Database in Depth, as a matter of fact). And number 7 is just flat wrong again.
In a later question you mention the fact that the O'Reilly book has an epigraph from Leonardo da Vinci, and I'll get to that particular question in due course. But I think it's pertinent to mention here that the book has a second epigraph also, from Josh Billings: "The trouble with people is not that they don't know but that they know so much that ain't so." The document just quoted illustrates this point only too clearly.
By the way, if you're not familiar with it already, you might like to take a look at Fabian Pascal's website, which contains (among other things) numerous ignorant quotes like the one at hand, together with some analysis and deconstruction of those quotes on Fabian's part. I contribute to that website myself on a fairly regular basis.