Preface: if you love XML, that’s fine. I’ve nothing against the technology per se, but it’s not always the best tool for the job.

I’ll be in Copenhagen next weekend for the Nordic Perl Workshop giving a talk about multi-language test suites. This will be based on the work done with TAP::Parser and it will contain a brief discussion of TAP (the Test Anything Protocol), a protocol which is almost 20 years old and is gaining in popularity.

One question I’m sometimes asked by those not involved with TAP is why we don’t use XML for our test results. This is a brief attempt to answer that.

Years ago, I used to sell cars. We would usually have a row of “problem” cars near the back. Sometimes the driver side door wouldn’t open. Other times, the engine wouldn’t start unless you held the steering wheel “just so”. Many of these cars ran just fine, so long as you could get into them and figure out how to start them. I like to think of these cars as the automotive equivalent of XML.

I’d be quite happy if I had a Euro for every broken XML generator or consumer out there. Many XML proponents gnash their teeth at this, claiming that XML is simple and that people shouldn’t have a problem with it. Well, here’s a little demo I want to do at a conference some time. I want to ask for a show of hands of how many people know XML. Then to encourage them, I’ll hold up my hand.

After looking at the sea of hands in front of me, waving gently back and forth, I’ll then hit them with my follow up question: “how many of you know what an unbound prefix is and how to fix it?” I’ll explain that if they hold their hand up, I might just call on them and ask them technical questions about this. I fully expect the sea of hands to evaporate to a puddle. People say they know XML but when you get down to the details, they don’t.

Real-world problems implementing XML are legion. One programmer complained to me about a contract won by a company which promised “secure, portable document exchange between government entities”. They were going to use XML as their portable format. Part of their security involved generating a digest for the XML document. The entire document. As a string. Have extra whitespace? Too bad. Are your attributes in the wrong order? Too bad.

Another example, one that caused me much grief, was implementing Yahoo!’s IDIF format for a Web site. Yahoo! purchased IDIF from Inktomi and I’m sure that the Inktomi salespeople must have been giving themselves high fives for selling one of those back lot cars. Except not only is it difficult to get into this car, it’s engine keeps misfiring. Let’s look at how Yahoo! describes IDIF:

The IDIF format consists of a stream of documents and associated metadata. IDIF is similar to XML, and follows XML rules of syntax.

Yes, IDIF is similar to XML. However, it does not follow XML rules of syntax. You basically wrap unescaped HTML documents in XML tags. You’re even forced to declare your document as XML with <?xml version=”1.0″?>, even though the document is clearly not XML. Heck, even though you are not allowed to escape your HTML in the document, you’re not required to provide valid XHTML (and you can’t use CDATA, either, though their specification is very vague about what’s allowed)! If you’re curious about how poorly this has been handled by Yahoo!, you can read my blog entries about my IDIF adventures. (Ironically, Yahoo!’s IDIF description does not properly escape all of their examples, so some examples never appear on the page).

I’ve encountered many broken XML parsers and generators, many people don’t know XML terribly well and even huge corporations get it wrong all the time, but what about TAP? Here’s a TAP document:

1..7
ok 1 - input file opened
not ok 2 - first line of the input valid
ok 3 - read the rest of the file
not ok 4 - test protocol
#   Failed test 'test protocol'
#   in t/mytest.t line 13.
#          got: 'XML'
#     expected: 'Happiness'
ok 5 # SKIP (Don't fork on Windows)
ok 6 - you shall not pass!
ok 7 - Gandalf wins.  Game over.

Hmm, that’s pretty easy to read. And guess what? That’s machine readable, too. Since it’s mostly line-oriented, you don’t have to wait for a ‘well-formed’ TAP document before reading it. If it encounters a line it doesn’t recognize, it discards it (this lets us be forward compatible). And how easy is it to generate? Here’s a simplistic, but valid, TAP generator in Ruby:

#!/usr/bin/ruby -w

class Test
    def initialize()
        @count = 0
    end

    def ok( result, description = "[no description]" )
        @count = @count + 1
        print "not " unless result
        puts "ok #{@count} - #{description}"
    end
end

test = Test.new()
test.ok 1 == 1, "one is one"
test.ok 0 == 1

As your needs grow, so would a TAP producer, but really, it’s very, very easy to produce valid TAP. Heck, up until about five minutes ago, I didn’t know how to write a class in Ruby! (That says as much about how easy Ruby is as it does about TAP, I suppose). Here’s what the above generates:

ok 1 - one is one
not ok 2 - [no description]

So, what would that look like in XML?

 <?xml version="1.0" encoding="utf-8"?>

 <TestResults xmlns:xsd="http://www.w3.org/2001/XMLSchema"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     <TestResults>
         <result id="0001"> 
             <passfail>Pass</passfail>
             <descriptin>one is one</description>
         </result>
         <result id="0002">
             <passfail>Fail</passfail>
             <descriptin>[ no description ]</description>
         </result>
     </TestResults>
 </TestResults>

(Adapted from: http://msdn.microsoft.com/msdnmag/issues/06/06/TestRun/)

Heck, why don’t you write a generator for that? Yeah, go ahead. I’m waiting. It’s the same results as what the Ruby program would generate, but far, far more verbose. And did you see the mistake in the XML? With TAP, you have to work hard to get an invalid test result line.

If you would like to know more about TAP and how you can use it, sign up for the mailing list (it’s language agnostic) or visit testanything.org. Both of those resources are relatively new. TAP has long been shepherded along by Perl programmers, but with implementations in PHP, C, JavaScript, PostgreSQL (PostgreSQL?) and many other languages, we’ve started to move away from our long-standing Perl-oriented resources.