Many programmers feel they have bragging rights if they’ve written large systems. This isn’t always fair as many times a quick twenty-something line program might save the day and programmers who can crank them out shouldn’t be undervalued. Be that as it may, sometimes we need to write large systems and we need to know how to do it. But what if you’re just writing a small system? What’s small? And as many of us know, small systems stick around and often grow. While rules which affect larger systems don’t always seem as important on small systems, it’s fair to say that if you want your small systems to be able to grow to large systems, it doesn’t hurt to start with sane rules.

As many readers of The Daily WTF know, many large systems are written terribly and are hard to maintain and extend. In fact, the problem is pervasive enough, even with well-known software projects, that ultimately the conclusion is reached that the software must be rewritten from scratch. Usually this is a mistake. But if you stay with the existing system, programmers often wind up trapped in fear-based programming.

“We can’t alter that table because too many things rely on it!”

“We can’t get rid of that global, we don’t know what it might break!”

“We don’t know what this does, but we think it might be important!” (I’ve heard this far too many times.)

While those lamentations often drive programmers crazy, they’re often completely rational. If you don’t know what’s going to happen, you have to figure out if the risk is worth the reward. If the risk is difficult to quantify, code paralysis sets in. Servicing technical debt is painful and often embarrassing. However, here’s the hard bit: while fear-based programming may be rational, refusing to figure out a way out of the corner you’ve coded yourself into is a bad idea for an actively maintained system. But I’m not going to talk about paying off technical debt. I’m going to talk about not incurring it in the first place. That’s the trick to building large systems that are a joy to work on.

Eat Your Own Dog Food

Here’s a little secret that many “test-infected” developers know: testing makes you a better programmer. It’s not just that your code works. It’s that if you find something is hard to test, that’s a code smell. Maybe your superWunderFunction() which takes 13 arguments isn’t designed terribly well. That’s not saying that all hard-to-test code has a design flaw (GUIs, for example), but as you test more, you start writing code that’s easier to test.

Your functions will take fewer arguments. Your functions won’t try to do too many things. Your functions are more likely to be loosely coupled. You’ll have less reliance on global variables. The list goes on and on.

When you start writing code that is easier to test, do you know what you’re doing? You’re eating your own dog food. You’re using your code and you start writing code which is easier to use. It starts becoming better-designed code. As an added benefit, if programmers are unsure how to use your code, they can always read the tests. Tests are not a substitute for documentation, but they are an excellent supplement to it.

Other End Up

There’s a long-standing joke about how [insert favorite group you like to pick on] has to drink from beer bottles with “other end up” stamped on the bottom. Sometimes it’s funny, often it’s offensive, but it hides an interesting truth: bottles are simple. It’s awfully tough to not figure out which way you’re supposed to drink from an open beer bottle. Good user interface. Ever put a condom on the wrong way? Bad user interface. User interface is important. Unfortunately, many programmers just throw together the first interface they think of. It becomes intuitive to them, but like a remote control you can’t figure out how to use, it’s a source of frustration for everyone else. This is a design flaw.

Don’t just sit down and start writing code. Sit down and think “how would my dream code work?” How can I write something so simple and easy to use that I can’t get it wrong? Then write the code for it. As an example from Perl (I’d write pseudo-code, but this is language specific), there’s a great module called Data::Dumper which allows you to print out variable contents as valid Perl code, even if they’re complex data structures. However, you get output like this:

$VAR1 = 'bob';
$VAR2 = [
  'one',
  'two'
];

What’s $VAR1? If you dump out a lot of variables, they rapidly get tough to follow. I wanted the variables to have their correct variable names. To do that with Data::Dumper you have to do something like this:

print Data::Dumper->Dump(
    [ $name, \@numbers ],
    [ qw/$name *numbers/ ]
);

That’s just ugly, but here’s what I wanted to do:

print Dumper( $name, @numbers );

Turns out that’s not that easy, but I figured out how to do it and now, with Data::Dumper::Simple, you get this:

$name = 'bob';
@numbers = (
  'one',
  'two'
);

This is incredibly useful for debugging and I’ve gotten a lot of thanks for it. I did this by imagining my “dream” code and figuring out how to make it happen (I’m not usually this smart. It was a good day). This is a principle you want to always follow. Heck, one thing which helps is just scribbling down some ideas and asking an unsuspecting developer “does this make sense?” If they have to ask any questions, maybe you can make it simpler still.

What? That’s not enough? Here’s how to print something to a file in Perl:

open FH, ">", "somefile.txt"           or die "Can't open file: $!";
print FH "This is written to a file\n" or die "Can't print to file: $!";

Now let’s look at one way to do this in Java:

import java.io.*;

class WriteFile {
    public static void main(String args[]) {
        FileOutputStream foStream;
        PrintStream pStream;

        try {
            foStream = new FileOutputStream("somefile.txt");
            pStream  = new PrintStream( foStream );
            pStream.println ("This is written to a file");
            pStream.close();
        }
        catch (Exception e) {
            System.err.println ("Error writing to file " + e);
        }
    }
}

Which do you think needs “other end up” instructions? (To be fair, the Perl API isn’t perfect, but damn, it’s one hell of a lot easier to use).

One Click to Rule Them All

I generally work for companies that do a lot of Web-based development. To deploy a new version of the Web site, the process is almost always a variation of:

  • Find the text file you saved the deploy instructions in.
  • Start following the steps, one by one.
  • Note when any steps are optional.
  • Note which steps have special instructions for them.
  • Curse vehemently when you’ve missed an instruction.
  • Undo the last three instructions.
  • Follow the missed instruction.
  • Continue with the rest of the instructions.
  • Go home and have a few drinks over a successful launch.
  • Get paged at 3:30 in the morning when you the Web site crashes.
  • Work for an hour to fix the bug.
  • Find out you had an old copy of the deployment instructions.
  • Admit defeat.
  • Work for two hours reverting the Web site and database.

This is wrong. You need one-click install, one-click rollback. If you have to do something repeatedly, find a way to automate it, particularly if getting it wrong will cost more money than fixing it.

Need to deploy the next version of code, including database changes? Automate it. Need to roll back those changes? Automate it. Need to check out a new code base and build a test website and database for it? Automate it. Your boss wants weekly status reports? Automate it.

I’m not kidding about automating status reports (well, maybe a little). It can’t always be done, but if you can figure out a way to automate it, you’ll be much happier. One strategy is to make your source control commit messages meaningful and then writing code which reads them and emails them. Make your email subjects meaningful and you can include in your status reports “Emailed Nancy about the ‘Smell in the Bathroom’”. If something needs to be done repeatedly and you can figure out how to automate it, you’ll save yourself much pain and headache later.

The Price You Pay

OK, your code is well-tested. Your code is better designed. You have intuitive APIs that anyone can use. Most processes are automated to remove bug-prone grunt work. You’re well on your way to making a system that’s easy to use, refactor, and extend. But you’ve paid a price. You’ve front-loaded your costs.

Though some testing advocates deny it, writing tests can mean you’ve spent longer developing features. Sometimes it’s because you’re figuring out how to test something. Sometimes it’s because you’ve exposed a design flaw which requires a bunch of refactoring. Testing can simply take longer. And spending time up front creating a “dream” API can take longer and sometimes they’re more difficult to implement than the quick hack. And trying to figure out how to automate something can take longer than just doing the actual task. For small systems, these costs can add up rapidly.

But I meant “front-loading” your costs. You’ve incurred less technical debt which means you have less to pay later on. For most large systems I’ve worked on, the maintenance phase lasts much longer than the development phase so everything you can do to reduce costs in the maintenance phase can pay off wonderfully, but you frequently have deadlines you have to meet. Writing tests means that if you change something, you’ll probably find out quicker if you break something, so changes to the system are are easier to implement. Because you have good design and “dream” APIs, other developers can understand your code better. Because you automate everything, repetitive tasks don’t waste labor hours and are less fragile. But you need to save money now.

Don’t Sweat the Small Stuff

OH. MY. GOD! I can’t believe you wrote that dreck!

Ever heard a variation of that? For many programmers, you might be tempted to say than when you see something like this:

for i in array1
    for j in array2
        if i == j
            duplicates.add(i)

That’s just awful. If you have ten thousand elements in each array, this could be an awfully expensive routine.

So what? I don’t care. There’s an old saying that a sufficiently encapsulated hack is no longer a hack. When you see something like that, ask yourself three questions.

  1. Does is do what it’s supposed to do?
  2. Is it sufficiently encapsulated so that it’s easy to change if needed?
  3. Am I able to read the code easily?

If you answer “yes” to those three questions, ignore the “problem” and move on. You have work to do and squabbling about little issues and fixing problems which might not be problems is a waste of time (note that you probably shouldn’t answer “yes” to the first question if you don’t have tests for it.)

Now the above code might seem like a newbie mistake and I confess that I have an almost pathological aversion to it, but I deliberately chose an example of code I despise to demonstrate an example of code which I’ll ignore, despite my feelings.

Here’s the problem: it’s not a performance issue until you’ve proven it’s a performance issue. What if it turns out each array can only have three elements? It’s probably not a performance issue, but that’s not obvious by just looking at it. Until you’ve proven there’s a problem, don’t fix it. I know this is a terribly controversial point for many programmers, but we shouldn’t forget that we have jobs to do. Constantly rewriting working code means we’re not getting new features written (refactoring is an obvious exception).

Reduce Features

You’ve heard the joke. “Fast, good, or cheap. Pick two.” That’s three things: deadline, quality, and cost. Most people admit that you can’t get the best of all three of those. Rarely do bosses say “don’t worry if it’s any good”, so we take quality off the list. Sometimes there are legal or market reasons to beat a deadline, but often it’s a simple matter of “I want it done in three weeks”, so we take the deadline off the list. It’s common to have a boss say “I won’t pay for a Mac”, so you can’t easily test if your code compiles on OS X. Now we’ve taken cost off the list. We need it fast and cheap and good. We all know which of those three we can hide from the boss.

There’s another way, though. Do your spreadsheet really need the 3-D VRML graphs when you first launch? Does that your screenplay authoring software really need to support remote collaboration at first? Does your budget management software really need to support MySQL, PostgreSQL, SQLite, Oracle, CSV files and cuneiform tablets? You’re not saying you won’t add these features, you’re just trying to focus on the features you need when you first launch. And guess what? You might out that your customers really aren’t crying out for cuneiform support after all!

Plenty of times I’ve met deadlines by delaying less critical features until after the launch. If you’ve followed the above rules, you’ll often find out that those features are easy to add later.

Conclusion

What I’ve outline above is mostly the coding side. Building large systems might involve making appropriate hardware choices. It might involve carefully designing a network, understanding load balancing, choosing appropriate database software and a host of other things I’ve not covered, so the above list isn’t enough, but it’s a great start for coders. Many of us have seen horror stories about how small projects have grown to large projects, but we didn’t plan them to grow. I’ve written some of those horror stories. Sometimes we’ve wanted to start out large but we don’t know enough about building those systems to get there. By spending a little time up front testing, making things easy to use, and automating everything, you too can build large systems.