SourceForge, still a center of participation in the open source world, has one more day left of voting for the SourceForge Community Choice Awards. Here’s your chance to give your favorite project a bit more recognition for excellence. (Why not choose something that doesn’t walk away with awards every year?)
All right, in the natural vs surrogate key religious war, I rather feel on the natural key side (I have nothing against using a surrogate key as a shorthand for a complex primary key, but I have seen too many people adding to a table a system-incremented column and calling it primary key without other analysis).
But I must admit that counters, whether you call them identity or auto-increment columns, or simply sequences, are extremely valuable for relating the techy’s to the bean counter’s vision of activity. In most applications, there is at least one such counter that represents fairly acurately the business activity - whether it is an increasing order number, invoice number, transaction identifier or whatever. Rather than execute complex queries with conditions on date columns to collect those so precious “key performance indicators”, it may be easier to check the data dictionary, look for sequences or identity columns, and take a daily (or anything) snapshot of the current highest values, together with statistics about the number of executed statements and whatever your DBMS has to offer.
Suppose you’re a college student performing a research project. Part of
that research is a survey with several types of questions. To make your life
easier, you’ve decided to create a small web application to ask the questions
and record the answers — this also gives you a SQL database you can query to
analyze the results.
(No, I’m not a college student — a friend is and the question stumped
me.)
I have been recently involved in trying to improve the performance of an application that primarily handles messages (more and more of them). It is a kind of state machine. Three different types of entities are managed (one of the types happens to be sets of another type), and messages are received that tell what is the most recent status of each entity. So far so good, except that everything revolves around the current status of each entity, and that the main bottleneck seems to be queries that hit repeatedly a table MESSAGE_HISTORY(msg_id, entity1_id, entity2_id, entity3_id, status, changed_by, timestamp), finding out which is the most recent status for a given entity1_id, entity2_id or entity3_id (as you may have guessed, only one of those contains a value in each row), associated with the greatest msg_id for the enity in question.
The solution that looked best was what was presented to me as “denormalizing”, and associating its current status to each and every entity.