I practice annoyance driven development. I set my threshold of annoyance low such that everytime I feel frustrated by a technical limitation, I notice consciously. My intent is not to find technology endlessly frustrating (though that happens sometimes), but so that I can identify the next most important thing to fix.
For example, Parrot has a large test suite. Several of those tests exercise the source tree as a whole, checking for copyright notices, Subversion ID strings, and metadata properties. I call these non-functional tests, because they exercise externalities of the project, not features of the code. Having accurate copyright notices and repository metadata (especially “Make sure these files have the proper platform-specific line endings”) is useful… but analyzing thousands of files in dozens of directories isn’t instantaneous.
Because we have several contributors, we attempt to keep all of our tests passing on all of the platforms to which we have regular access all of the time. (Exotic platforms like Windows aren’t always so fortunate. Porters wanted.) To achieve this, committers must be able to run the test suite before checking in changes.
Everything so far is obvious. What wasn’t immediately obvious to me was that there’s a threshold beyond which people will not run the entire test suite.
I noticed this at first when I saw that the non-functional tests failed a couple of times a week. Someone usually came along to fix them shortly, but we spent more time noticing failures and fixing the tested externalities than we ever did fixing the problems these non-functional tests attempt to prevent.
Worse yet, I measured the performance of the test suite once and discovered that non-functional tests took up between 25 and 40% of the time for the entire test run.
There are a few possible solutions. One is to give up and use a continuous integration server to run all tests and eventually report failures. This has some advantages in that it can give us platform coverage for platforms to which we don’t normally have access (and if you’re willing to devote some cycles on a box or a VM for an OS/platform/compiler combination more exotic than GCC on GNU/Linux or FreeBSD on 32-bit x86, please let me know!). The disadvantage is that we’d give up what could be a very tight feedback cycle of test-code-refactor-test-commit. (I picked up this idea from James Shore; there’s a chapter devoted to this topic in The Art of Agile Development.
Another solution is to drop the non-functional tests. Here we do run the risk of losing valuable information. Though we haven’t found very many real bugs from the tests lately, we have had real bugs in these areas in the past. It’s possible that we’ll avoid these bugs in the future, but it’s more likely that our project’s institutional memory will see a failure and say “Hmm, I think that’s a line-ending problem, so type this particular command to fix it….” This is a calculated risk.
Ultimately the right solution may be to enforce the metadata properties on the server side, such that it’s impossible to check files into the repository unless they pass some very brief sanity checks. The advantage here is that we can know that files in the repository are pristine, but the disadvantage is that we’d have to enforce this on the server side, if it’s even possible.
For now, we decided to move the non-functional tests to a directory of tests we run less frequently — every week or two, and always right before our monthly release.
This isn’t the only solution, but it’s a big step toward speeding up our test suite further. There are other possibilities as well — porting more tests from our Perl 5-based harness to native Parrot is often valuable (and we’d love to help you learn how to do this). Running tests in parallel may take advantage of blocking IO and multiprocessing to offer nice gains as well.
The important lesson from all of this for me is to pay attention to repetition and other painful parts of the process, to ask why they exist, and to focus my energy on making that painfulness go away. It’s one of the foundational principles of Parrot (take the pain out of writing functional, performant, cross-platform, powerful compilers), and it serves any technical person well on any project.
To learn more about speeding up test suites, I recommend the work of my colleague Curtis “Ovid” Poe, in particular Order Restored and Speeding up long-running test suites. He’s submitted a talk for OSCON about these techniques.
I’ve submitted a talk to OSCON about repairing organizational and technical damage in development projects, so I’ve been listing antipatterns and their solutions along these lines.

"Exotic platforms like Windows..."
That gave me a chuckle. Thanks.
Why not have the non-functional tests run on the CI server, and let the developers have at the functional tests? And why is it so hard for you to run a subset of tests?
The CI server is a good idea in any case. Sometimes developers get non-virtuously lazy and/or tired and forget to run their test suite before they check in, and it's good to catch that ASAP. Plus you get exotic platforms. :)
Drop me an e-mail. I've got a spare AMD 64 box kicking around, and I might be able to spare some clocks on a Windows machines. The two of them are shortly going to be turned into local repositories, CI servers, and BOINC clients. Might as well donate them for good use.
I'd also recommend people check out Improving Test Performance. That quickly covers about a third of what the proposed talk will cover.
Your comments also highlight something I plan to bring up at the beginning: trade offs. Just about every technique I introduce will involve trade offs (particularly process separation issues) and developers will have to know those and weigh them carefully against proposed benefits.
It sounds like the testing isn't divided into appropriate domains. I think you should keep your tests close to the problem at hand. So perhaps you should have your standard make/ant target run the full compile/run tests, and defer the nonfunctional tests to your checkin target.
This gives you the benefits of maximum speed during your most-frequent activity (edit/make/run), and you're still covered during your checkins. You may find this a bit less frustrating, as when you're checking in your code, you're at a natural pause point anyway.
...roboticus