Organizing documents, and especially e-mail is a never-ending endeavor. Indeed, people are, by nature, messy. I know of no large organization where people have heard of an e-mail thread — meaning you get replies about an upcoming department meeting quoting your birthday wishes note from 2001 —, formatted subject lines — meaning a simple e-mail exchange will usually see 3 spelling variations of the same words —, specific headers — but that may be because most clients don’t make them easy to add — or any other trick one would naturally think of to get organized.

Come to think of it, there is no reason why people should know or care about these things. After all, they’re doing their work under a lot of pressure and, even if we geeks think being a little more organized couldn’t hurt, we cannot ask someone who is relatively new to computers to get hyper-specific with e-mail management.

Search systems like the life-saving Zoë or dedicated services like GMail have made our lives easier. Even Spotlight has made finding mail on Tiger a lot more efficient and easy than it used to be. However, as I mentioned in the past, we should not forget that, no matter how easy these systems make the retrieval of information, they do not organize content for us and we should not be lulled into a false sense of security — what if the system breaks, if we change platforms or some yet-to-be-determined incident makes our organization obsolete?

A good and widely accepted trick is to put mail in folders, organized by project and, within these folders, organize mails by sender — like “John” in “Project Bubble Gum”. Sure, John may send you a mail referencing both project Bubble Gum and project Carrot Sticks but, even in the worse case, you’ll only have to look in two folders to retrieve the message, without automated assistance. Classifying mail and documents by date is also an option although it makes the retrieval of files out of the blue a lot more difficult.

With a seeming classification system, finding a particular document should be possible. It may require a lot of effort, a lot of work, but, as long as you have a vague idea of how you proceed (and you stick to it), it will be possible. The problem however lies in referencing the document.

Indeed, how do you, for any reason, reference a specific mail you received? Often, we have to resort to the likes of “the mail I sent you on January 1st 1969, at 13h 00 UMT regarding project Leather Shoe”. That is all very well and it’s certainly precise enough to go to court with — but it’s a pain.

Recently, I started playing with GUIDs — Globally Unique Identifiers. By tagging every document with an almost-random number, I can easily reference it once I have found it. Sure, I may seem crazy when I ask people to look up file ID “e43dgff44332fgfDFvc” but, in my experience, once they understand the freakishly long number is here to ensure there won’t be two files with the same ID and they can actually copy and paste text from an e-mail into their search application, people respond very well.

Of course, this brings us to the problem of generating a GUID. It needs to be sufficiently long to be unique, needs to have no cryptographic value whatsoever (or you’re just about sure someone will try to use them as digital signatures) and needs to not reveal any information about your computer — which just about rules out the otherwise very useful “uuidgen” command on many platforms.

So far, the system seems to work but I’m still working on how to generate the best possible GUID. Anyone interested by this challenge?