What’s the worst bug you’ve ever written? Or if you’re a sysadmin, what’s the worst system problem you’ve allowed to happen?
It sounds like a starting point for war stories over beers at OSCON, but it can tell you a lot about yourself. Analyzing mistakes is a crucial skill for everyone, but especially for programmers, where mistakes can be so devastating, and are so easily fixed in the future.
I ask this question of every programmer I interview (and I’m still looking for programmers, by the way) to get an idea of the candidate’s self-awareness, and to get a feel for her background. If she can’t think of one, then I know she hasn’t been around very much.
For me, it was an Exchange conversion project…
I had just started at a new company and was eager to show my chops. My department was in charge of the company mail servers, and they were upgrading from one version of Exchange to another. For some reason, the conversion process would not bring over mailboxes from the old version of Exchange to the new one. Dozens of users had hundreds of mailboxes each, containing correspondence with customers.
Hotshot Andy spent a day or two with using one of the Perl Win32 modules, calling Outlook OLE objects to read in data from one Exchange instance and write them to the new instance. My program would log in, suck up the mailboxes, log in on the new system, and create new mailboxes. The messages converted fine, and I had many safeguards to make sure that message counts before and after were the same. It worked beautifully.
We spent a weekend migrating over the data, as well as all Outlook clients, and Monday morning brought no complaints. We were all pleased with how well it all went. Around Thursday, well after the point of no return, the complaints hit.
It turns out that these mailboxes were organized into folders, and my program hadn’t taken that into account. All user mailboxes were in the top level of the hierarchy. All organization was lost. Worse, they couldn’t get back to the old instance of Exchange to see how things had been organized. We couldn’t even do the grunt work of recreating the hierarchy because we weren’t familiar with the data.
Then, after the candidate tells me the story of the terrible bug, I ask the crucial follow-up: “What did you learn? What did you change about yourself?” The reaction is often telling, and I can easily see how self-aware she is. If the answer is “We fixed the bug and had to do some cleanup,” then I know nothing’s been learned. If she comes back with a “I’ll tell you one thing: I made sure that my X always….”, I know that she’s a self-optimizing person.
In my case…
The crucial error was making assumptions about the data. I had created my own dummy mailboxes, with my own dummy data in it, rather than using real live data. If I’d looked at live data, rather than assuming that I knew what it would look like, the hierarchy would have been immediately clear. “Always look at real data early in the project” is a long-standing maxim from me.
Think about it over the next few days. What’s your biggest mistake programming? Did you change anything? Or maybe you find you’ve over-compensated, and are overly cautious? How do you optimize yourself?
What’s your worst bug? What did you change? How do you make sure you’re improving?