It’s the bane of many a programmer. The bug is reported, users are complaining, but management says “it’s too expensive to fix”. The basic idea is pretty simple. Customers running OS X start complaining that your software keeps crashing whenever they access certain features. Management sits down and figures out that they’re losing $1,000 a year because of this. They then get an estimate of how long it will take to fix and they hear 20 hours. However, that doesn’t include the overhead of project management, testing, delivery, etc. They do the math, and realize it will cost them about $9,000 to solve this problem, or 9 years to recoup their investment. That’s $9,000 they can’t spend elsewhere. If “elsewhere” is likely to generate more revenue, then they’re losing a heck of a lot more than $9,000, so they figure it’s not worth the money to fix.
Now right off the bat, this argument has merit and it should not be dismissed lightly. If you have limited resources you have to sit down and figure out how much money is in the piggy bank. However, while it often has merit, it’s also often wrong. There’s one area which this ad hoc analysis doesn’t consider: people. When you forget about people, your results will be wrong. So now that you know how “too expensive to fix” works. Let’s see why it’s so often wrong. The problems lie in three main areas.
- Broken windows
- Employee morale
- Fast food restaurants
There is a theory in sociology called the “Broken Window” theory. Basically, it claims that you can tell when a building is going to go downhill by the first broken window that remains unfixed. This is a license for vandals to break other windows, potential tenants to shy away and basically creates a downward spiral in overall quality. Got a leaky pipe? It’s just One More Problem.
Now there’s a lot of controversy over whether or not this is true for buildings, police work, and other areas of public life to which it’s been applied, but it’s definitely true for software development. How many of you have looked for warnings in your Apache error log only to see so many warnings that you can’t decide what’s a real problem and what’s not? Finding the information you need becomes expensive.
Test suites are another example. I remember accepting one job because, amongst other things, the programmers were all strong testing advocates. When I arrived I found a huge test suite that took over half an hour to run and many of the tests were failing. I was told “don’t worry about them, those are false negatives”. The problem was, it was awfully difficult to see when a “real negative” appeared. I was responsible for upgrading our version of Perl but I couldn’t do it until those test failures went away because there was no way I could easily see if the new version of Perl introduced new bugs. As a result, my work on the upgrade was much more expensive then it needed to be (when I left the company, they still hadn’t upgraded).
Broken windows are a real problem. If you leave customer bugs unfixed, when a bug report comes in, it becomes “just another bug”. Leading by example shouldn’t be thought of as a strategy, it should be thought of as “the way things are”. If your manager doesn’t take bugs seriously, why should you?
How would you like to take part in the following interview?
Interviewer: So, what was the last project you worked on?
You: Oh, I was the lead programmer on www.example.com.
Interviewer: Oh, I’ve heard about that. Didn’t it fail because of all of the bugs?
What do you do? Tell ‘em you’re a lousy programmer? You might as well walk out of the interview. Defend the product? You’re contradicting the interviewer, rarely a good strategy. Tell ‘em your management wouldn’t let you fix the bugs? That’s almost worse because many interviewers will automatically reject candidates who badmouth previous employers (because they know that you’re probably willing to turn around and badmouth them. It’s also might mean an attitude problem). It’s a tough situation to be in and frankly, when bugs start mounting up, employees know that this reflects on them and it’s embarrassing. I’m proud when my software works. I’m embarrassed when my software fails. Morale plummets and employee turnover starts to increase. This can be a very difficult thing to quantify. What’s worse, once a couple of employees start to leave because “they can’t take it any more”, other employees start thinking the same thing. I’ve seen companies who have a mass exodus because of this problem. It’s even more difficult to get a handle on because employees rarely tell management “I’m embarrassed to work here”; they know better than to burn their bridges. Instead, management hears “I found a better paying job”, “I found a place closer to work” or “I want to move to Portland, Oregon and I found a job there.” (Portland being one of the most beautiful cities in the US, the latter excuse is understandable).
Employee morale about working on products they can truly be proud of is a huge factor, but one that is often not considered because it’s almost impossible to know how to quantify it.
Fast food restaurants
I was 21 years old and had my first management job, working at a huge fast food chain. We had lots of money, but we also had a big problem. Our customer satisfaction surveys were terrible and our growth was stagnating. Since the company had a lot of financial resources, they hired an outside firm to figure out what was wrong. When we finally got the results, they were fascinating. Our customer satisfaction surveys were only giving us the results of the customers who were willing to take the trouble to complain. That sounds obvious, but what we didn’t realize was that the vast majority of our unhappy customers left quietly without saying a word and had no intention of returning (I can’t recall the exact number, but I think it was something around only 1 in 20 customers who were telling us why they were upset). We were only finding out why the vocal customers were unhappy and it turns out they were unhappy for different reasons then those who just went away (I won’t go into those reasons lest I identify the company).
What’s even more disturbing was the next item of the report. It’s not that our unhappy customers who didn’t say anything to us weren’t talking; they just weren’t talking to us. Instead, they were telling an average of five other people about how awful we were. Every time we upset someone, we risked losing six customers without any chance to find out why!
Now if you’re just a tiny software shop that no one has heard of, you might be able to contain the damage. The unhappy customer complains to their friends who in turn quickly forget. When you start to gain name recognition, this problem becomes harder to handle. The unhappy customer complains to their friends who think “yeah, I’ve heard of them.” And those friends are more likely to remember how you burned their friends.
A perfect example
This is the real problem with simplistic cost/benefit analysis: you really have no serious way of quantifying the people aspect of the equation. When you earn a bad reputation, whether it’s amongst your employees or your customers, it’s very hard to shake that reputation. As a case in point, let me remind you about Internet Explorer (IE), Netscape Navigator and the “browser wars”. You remember Netscape Navigator? At one time it was hugely popular and that’s the browser I ran, but Microsft poured lots of money and resources into IE because Netscape made the mistake of admitting that they were trying to offer a tool that worked regardless of which operating system you ran on. Microsoft realized this was dangerous and decided to kill Netscape. They poured a lot time and money into the IE project and eventually created a browser that many people felt was a better product (I developed on both at the time and I agree, IE was better).
After killing Netscape, what did Microsoft do? Well, most of us have probably heard Microsoft’s claims that they foster innovation. This seems to contradict the fact that one of their most ubiquitous projects, Internet Explorer, languished at version 6 for years. Microsoft had no competition and despite frequent gripes from people about it not being standards compliant, being riddled with bugs and full of security holes, Microsoft just didn’t see any direct profit in updating it. Even before IE 6, Microsoft had already discontinued support for Macintosh. They claimed it was because Macs already had decent browsers but I suspect it’s because they didn’t see any profit. Internet Explorer 6 was so bad that PC World magazine voted it the eighth worse software product of all time.
Needless to say, when new Mozilla-based browsers such as Firefox came along, many people were ecstatic to use anything which wasn’t IE. One recent estimate suggests that Mozilla browsers now have abut 13% of the market share. In Germany alone, Firefox is estimated to have 39% of the browser market share. So how does Microsoft respond? Well, they’re madly working on IE 7 and claim that they are already working on the next two versions after that. Given how badly they’ve damaged their reputation they have a lot of catching up to do but, unlike most companies, they have the funds to do it. I fully expect to see newer versions of IE being very interesting.
It’s easy to conclude that some bugs are too expensive to fix. In fact, if you’re looking at a $10,000 bug fix which is going to get in the way of a $100,000 profit, maybe it is too expensive to fix. But what about broken windows? What about employee morale? What about your public reputation? These things are difficult to pin monetary figures on, but that doesn’t mean they’re not important. If you’re “on the edge” or close to it, do the safe thing and fix the bug. Your customers will love it. Your developers will love it. Your product will be better for it.