If you have a sourceforge account, and are on your way to becoming the best thing to happen to the web since Yahoo or Google, then I beg of you to put a call out for people who understand database design fundamentals.
Designing an interface with PHP is one thing. Designing an “application” is quite another, as it includes designing the architecture of the application, how the various components of the application will interact and communicate, and also how the data used by the application will be managed and stored. This last piece is a decidedly un-sexy part of application design, and is often also (and unfortunately) trivialized by developers.
Here’s the story: If you’re a PHP developer, I don’t really want you to learn how to design a database. No, really. I don’t. I want you to write PHP. There are few people who do both things extremely well, because both take a good bit of time. If you’re a PHP developer, I want you to write code that’ll make my head spin. However, the path to greatness is to be conscious of your own ignorance — so just acknowledge that you’ve never done or studied database design, and go find someone who has!
I have NEVER seen a help wanted ad on Sourceforge for data designers. I’ve looked. I occasionally go back and repeat my searches. I’m always disappointed. Meanwhile, while there are applications that practice sane database design fundamentals, the overwhelming majority of applications I’ve evaluated have also left me disappointed.
I believe that this problem roots from a few different problems and misconceptions, which I’ll list here:
1. The interface determines database design
I disagree. I think approaching a database this way is somewhat short sighted. Doesn’t that statement imply quite strongly that any future changes in the interface will inevitably mean a change in the database design? And then, if we’ve agreed that interface changes affect database design, doesn’t that raise the chances that future releases are either going to be incompatible with earlier ones, or will involve laborious database migration schemes?
Furthermore, this notion breaks down quite quickly when you get into data warehousing schemes where more than one application is acting on the data. Then what? Which interface is in charge of defining the database?
Really, database design and interface design should only be related through the data entities themselves. The data model is the story of how data is related, and the interface exploits the relationships to provide a view of the data to the user. Aside from that, interface design and interface design are unrelated.
2. “I only need a small, simple database”
Maybe that’s true, but double check yourself. If you think that you only need three tables, but those tables are a mish-mash of duplicate data, null fields, or (the worst) fields that hold more than a single value, then your database design would appear to be the result of making SELECT statements as simple as possible instead of sane design principles. As your application marches ahead, you’ll eventually find that you need to overhaul the database, or scrap it completely and start over, which can be a pretty hellacious event to deal with as a developer. That database API you worked so hard on is all but useless, and it could’ve all been avoided if you’d just run all of this by a database person. In a lot of the instances where I see database design completely fail, it’s because the designers confused “data design” with minimizing the number of tables involved. In reality, more normalized data usually results in more tables, not less.
3. “More tables makes for harder coding”
Well, fewer tables, if it’s at the cost of normalization, makes for certain refactoring when you have to overhaul your data design because it can no longer support the needs of the application. Take your pick.
Really, it doesn’t have to be harder to code against in a lot of cases. Usually, inserts end up being slightly more complex. However, if you’re using a database that supports database “views”, then select statements will hardly have to change at all. If you don’t know what views are or how to create them, you should definitely seek the help of a database professional.
In Conclusion
Look, I’m not an advocate of using fifth normal form for everything. In fact, I’ve done a lot of database consulting and have never seen a database in fifth normal form. However, shooting for third normal form generally results in a database that is flexible enough to move in whatever direction your application decides to move, while at the same time keeping things simple enough to keep your PHP coders from having to become hardcore SQL wizards. Data design is a dry, monotonous, maybe masochistic practice. But it’s one that will pay you dividends well into the future.


"I don't really want you to learn how to design a database. No, really. I don't."
In that case I'd be interested to know which articles/books you would not recommend we read in order that we don't improve our knowledge of database design? I wouldn't want to accidentally learn anything which may just lead to troubling improvements in my design.
Seriously though, It seems as php developers we are moving well ahead with OOP, design patterns, unit testing, etc and there are plenty of sources of information for these, however, I've not yet been able to find a definitive selection of database design books that focus more on the application side rather than the database admin side, so I'd be interested in any recommendations.
Thanks,
Nick
0_o, I couldn't agree more. I've done a lot of Oracle projects and have gotten exposed to a good database design, however I have to say that it takes discipline to see that your database design contains a flaw and then actually fix it along with the PHP code... In the middle and long run, it of course pays off a hundred times.
Good database design is incredibly important if you want your application to scale. This means you also probably want to compile the database yourself, optimize its variables beforehand, practice good SQL coding standards, load balance, replicate, cluster . . .
LAMP has become really easy. But doing it well is hard.
Im not a guru in this kind of matters, but i agree with 0_o and Jeremy Foster.
"LAMP has become really easy. But doing it well is hard.".. nowadays everything is auto!
I think what most people fear of db design is relational algebra and normal forms, they are complex, yeah, but if you use them it will be way easier to design a good db.
Great article! There's only one thing I don't quite agree with:
"Here's the story: If you're a PHP developer, I don't really want you to learn how to design a database. No, really. I don't. I want you to write PHP. There are few people who do both things extremely well, because both take a good bit of time."
Although it may be true that few do both extremely well, every PHP developer should learn good database design. If you compare the two (code design and database design), they both have the same basic principles: use consise yet easy to understand names for variables and methods, remove duplication in your code through refactoring, don't let design suffer through premature optimization, optimize later when you know what needs optimizing - don't guess. These principles, and many more I'm sure, are also shared with good database design.
Learning database design helps the developer see these core principles in a simpler context. I am certain that learning database design has improved my code design immensely.
Your comments are sad, but true. Data modeling is hard, so folks don't want to do it. I, too, have never seen a properly normalized database (aside from small examples) and I keep getting bitten by this. Unfortunately, as databases grow, it's possible to have different database schemas for the same data and yet still have some disagreement over which is "more correct" (one database merging project I know of broke down when the DBAs couldn't agree over how to define "color" in the database). And changing business rules dictate different data layouts, further compounding the problem.
One example is how a customer address is to be stored. This address is almost always kept in the customer table. This becomes problematic if you want to query all addresses for which you don't have a customer. Businesses are constantly buying address lists because they want to send bulk mailings but don't know which addresses they have real customers at. Also, how do you represent that address? Should the street name be in a different table from the street numbers? Theoretically, they should. Recently in Portland we had a street renamed from "Front Street" to "Naito Street". In a properly designed database, you would only have to change the name of the street in one place. I've yet to see a database which implements this, though.
There are all sorts of little design issues which make designing a database far harder than folks think. The address example is where two performance issues come in. One is the programmer overhead of constantly writing more complicated queries and the other is the performance of the queries themselves. Almost always you'll see "123 Washing St, Apt 3" in an "address" field rather than breaking those down to their component parts. Further, while we should always design our databases correctly, real-world data doesn't always map cleanly to the relational structure in the database. Anything which requires a recursive definition shows how painful this can be.
Bah. I'm rambling (I tend to do that). What I meant to say was "thanks!" I'm very glad to see that some programmers are conscious of this issue. We need more like you.
intertidal zone food web
Good points. I'm using Splunk Base myself and shared some of these concerns as I began submitting samples. In the end, my fears were allayed and I ended up submitting many raw log samples with out any annonymization. On the other hands, certain logs were obfuscated while preserving the vital bits. This approach provides maximal benefits to the community while still preserving my sometimes paranoid attitude towards systems under my care.
Takeaways:
http://www.xanga.com/tiffanylamp
Actually, it's almost always true that newly written code, at least in the short term, contains more bugs than legacy code. This is purely because the legacy code has had so much testing and so much time fixing corner cases. Rewriting the code from scratch loses a large amount of the knowledge that is gained from all that maintenance work on the legacy code and it is rarely documented accurately or completely enough to avoid the loss.
Hi
http://ringblogstar.name/
Bye
dasad
great article! bytheway, database design is a fun and interesting practice, but i guess it does depend on the person....
and to OVID -
i agree that the correct way to store an address is a situation where there are more than one correct answers to, but storing the numbers in a table separate from street names would never be done.
Also, why wouldn't you just:
update customers
set address = REPLACE(address, 'Front Street', 'Naito Street')
instead of over-normalizing the address just because you might have to change street names? i dunno, but seems like more work to me... ;-)
The article brings out a great point: that people don't design their databases for flexibility. I agree that in most cases they should, but it shouldn't be forgotten that when one spends too much time focusing on the generality/flexibility of a database, he/she usually won't get around to building the application. The hole gets deeper due to the fact that as the database gets more flexible, the queries become more complex, taking more time per feature. This leads to a time mis-calculation by 50% when deciding on the level of generality of a database.
I have become a huge fan of short iterative development with many many prototypes and alterations. And this is not just limited to the software, but includes the database design as well.
With PHP's reasonable OOP support, using the correct layering of code, the changes to the database need to be coupled with changes to the second tier of data access (the first being a database query abstraction layer) but each method/function above this second layer should not need to be altered at all except adding new methods that the new level of generality allows.
That being said, it's important to design a database to an appropriate level of generality via the normal forms, but "appropriate" is dependent on the current stage/iteration of the software, not what the software is envisioned to be one day far in the future.
http://justbestnews.blogspot.com/
http://vava-free.blogspot.com/2007/09/california-state-
university-chico-chico.html
http://basevirus.com/deas/6/