How many open source relational databases can you name? My friend Gabrielle recently sent me the links to two of them. However, if you’re like most technical people, you probably don’t know any — just as I didn’t until recently. I can already imagine many of you saying “bulls**t”, what about MySQL and PostgreSQL?” (to name just two), but those are just databases, not relational databases. Noted (and controversial) database experts Chris Date and Hugh Darwen, building on the work of the founder of relational theory, Edgar F. Codd, have tried to educate people about the actual relational model but with little success. It’s a strange world of relations, relvars, attributes and other things which sound familiar, but when you look closely, they’re not.
Recently I found myself quite proud of taking a moderately complex query and optimizing it so that it took about 5 seconds to run instead of 15 minutes. In running it on a small subset of data, it worked perfectly. When running it on a larger subset, though, it turned out to be broken. One number which should have been around 1000 was actually 22,000. Another programmer and I sat down with it and worked through everything and spent a couple of hours debugging it. After a while, the bug was embarrassingly obvious. One of the INNER JOIN statements was a one to many relationship instead of a one to one relationship. It was returning and summing duplicate rows. We’ll come back to this in a moment.
In the relational model, instead of tables, you have relations. There are a number of differences, but the key one here is that a relation does not allow duplicate tuples (analogous to “rows”). In SQL, you can also create views. Views sort of look like tables, but depending on the implementation, you have various limitations to how you can update, insert, and delete with them.
(I’ll start using the normal “table”, “row” and “column” terms now, but keep in mind that they’re not really the same thing)
In the relational model, based on set theory, a view is simply another table. In fact, when you select data, you’re simply returning a new table. This means that you can do anything with a view that you can with any other table. You can update, insert and delete from them and these changes will automatically propagate to other tables. What this means is that when you select data, you’re guaranteed not to get duplicates. My bug in the SQL I mentioned earlier? It would not be possible.
Interestingly, I’m told that Microsoft once tried to implement this with their Access database by having an implicit DISTINCT on all queries. As it turns out, this was a frequent source of bugs. Just because values are the same doesn’t mean that they represent the same thing. It’s not true that two guys named “John” are the same guy. Microsoft’s scheme might have worked if they had automatically removed duplicate values if and only if they came from the same row, but apparently they didn’t.
Another interesting feature of the relational model is that all columns must have their data types defined. You would have to specify their domain of values and the operators which can operate on them. What does this mean? Consider the following SQL statements.
SELECT first_name FROM customers WHERE age = id; SELECT first_name FROM customers c, orders o WHERE c.id = o.id;
Any of us can glance at those and know they probably don’t make a lot of sense. As queries get large, however, those types (no pun intended) of bugs can creep in. One system I worked on had exactly that type of bug in SQL which was generated on the fly from metadata and it was extremely painful to track down. We frequently had auto-generated 200 line SQL statements and wading through them was an exercise in masochism.
In the relational model, statements like those wouldn’t even compile. The age and id columns would all have their own data types and you simply wouldn’t define operators allowing them to be compared (c.id and o.id would also be different types). However, you could do the following because o.customer_id would have the same data type as c.id. You would define the ‘=’ operator but not ‘>’ or ‘<’.
SELECT first_name, last_name FROM customers c, orders o WHERE c.id = o.customer_id;
In normal SQL, there’s a good chance that query would return duplicate rows and if that’s a subquery, it could cause all sorts of hard-to-detect problems. But you already know what I’m going to say: that’s not possible in the relational model because duplicate rows can’t happen.
The two open source relational databases I mentioned earlier are Rel and LEAP RDB. Since SQL is not relational, neither supports it, but LEAP has plans to support a subset of SQL at some point in the future. There’s also a commercial product named Dataphor, offered by Alphora. I’ve heard good things about it but have not had the chance to use it even though they offer a fully functional evaluation copy.
The more I work with databases, the more I grow frustrated with not having the relational model available. Allegedly, when IBM finally started trying to follow this model, they built a query language named SEQUEL (sound familiar?) but the developers didn’t know much about Codd’s work and had little contact with him. Larry Elison then borrowed their ideas for Oracle and this has led us to the sorry state we’re in today. Fortunately, it seems that a number of people are realizing that the industry standard SQL query language and the databases they are built on top of them have significant limitations, so competitors are starting to sprout up. Competition is Good Thing.
For extra credit, you can read about why NULL values are a source of bugs in databases.


all relational query can be transform quite well into sql ..
i can see no (other then academic) reason to say sql is not relational enough.
many people just call posgtresql, oracle and mysql RDBMS (as opposed to OODBMS)
@SDiZ: I hear people say that a lot. When I first learned programming back in the 80s, I used to think that BASIC was good enough, too. The reason I thought that is because that's all I knew. Today, I hear people defend Perl 5's broken OO model because that's all they know. I hear people tell me that Java is a "pure" OO language because that's all they know. Remember in the late 90s when many MySQL developers argued that foreign keys didn't belong in the database? (Sadly, some still do.) That's because MySQL is all they knew. I believe this is the same reason many people today defend SQL against the relational model.
You can assert that SQL is "relational enough", but to do that you have to explain why SQL's ad hoc decisions are better than the set theory on which the relational model is based and you have to at least attempt to address some of the points raised above. Simply saying "that's not true" doesn't let me understand why you disagree.
A couple clarifications.
First, while in theory the relational model allows for updateable views, in practice, only some views can be updated. For example, a view based on a summary query with (ala SQL's "GROUP BY") is not likely to be updateable, nor would many complex views based on UNIONs.
The problem is that in some cases you simply cannot "go the other way" from the view to the source relations in any logical manner.
That said, all views that can be updateable should be, though this has less to do with SQL vs the relational model than the fact that today's SQL products just don't support this. It's worth noting that Postgres lets you manually describe how to make views updateable using their rules feature.
On the subject of data types, the significant difference regarding what Date & Darwen describe is that they think that defining new data types is a core feature of a relational system. Various SQL products today let you define new types with varying degrees of ease, mostly "not-so-easy". There's nothing in the SQL standard to say it has to be so damn difficult, though ;)
To say 'i can see no (other then academic) reason to say sql is not relational enough' is rather like believing that it is possible to be "somewhat" pregnant. But explaining to sql users that just as a biological condition cannot exist neither can a "like" relational database exist is a terribly difficult task. Sql users really have no idea of what a relational database is yet they are convinced they're favorite sql database is one:) That the major sql vendors are themselves knowledgeable about the distinction between a "value" and a "variable" let alone between sql and relational is an open question. I rather think a more mundane approach to the issue is required. Something like a coloring book with lots of pictures for children. And very, very basic stuff. So instead of trying to explain what a table "type" is I try to show that dynamic sql is necessary as the result of not having a table type available. Whey sql cannot simply rename a column without rebuilding the table. Why in sql you cannot pass a table as a parameter. That passing multiple scalar parameters to a procedure is the result of not having the concept of a "row" available, etc etc. etc. And just hope the reader will connect the dots (sort of like the 50s TV show 'Winky Dink') I have been blogging about these things using Dataphor (likes firefox best) @
with words and code (the pictures:). It is imho a brilliant attempt at capturing the nature of a relational database and the significant benefits it offers for application development. I encourage everyone interested in comparing "somewhat" pregnant to "being" pregnant to try Dataphor.
Thanks for the opportunity to express my view.
200 lines ? I scoff at your 200 lines. We have a commercial reporting package that generates SQL statements up to 9000 lines long. Try debugging that one (as I had to recently) !
"SELECT first_name FROM customers WHERE age = id;
SELECT first_name FROM customers c, orders o WHERE c.id = o.id;
In the relational model, statements like those wouldn’t even compile. The age and id columns would all have their own data types and you simply wouldn’t define operators allowing them to be compared (c.id and o.id would also be different types)."
From Dataphor:
select (Orders rename A_order) join (OrderDetails rename Many_orders)
by A_order.OrderID=Many_orders.UnitPrice
//Application:(error)105257 --->
//No signature for operator "iCompare" matches the call signature
//"(System.Integer, System.Money)".
Take a look at vlerq for one more relational database...
http://www.equi4.com/ratcl/vlerq
Just wondering if anyone here knows if the original Postgres (before the change to SQL I think they used a language called postquel) was a full relational database. I do know that it touted being based on strong types and an ability to add new types.
If so then is it possible to combine the old version with some of the new improvements?
Perhaps of interest to this group:
A list of projects related to the work of Hugh Darwen and Chris Date is available at The Third Manifesto web site which is dedicated to their restatement of the relational model.
In addition to the somewhat daunting Third Manifesto book, Chris Date has published Database in Depth which might be viewed as an introduction to the earlier work.
In my opinion both works should be compulsory reading for us all.