Preface: if you love XML, that’s fine. I’ve nothing against the technology per se, but it’s not always the best tool for the job.
I’ll be in Copenhagen next weekend for the Nordic Perl Workshop giving a talk about multi-language test suites. This will be based on the work done with TAP::Parser and it will contain a brief discussion of TAP (the Test Anything Protocol), a protocol which is almost 20 years old and is gaining in popularity.
One question I’m sometimes asked by those not involved with TAP is why we don’t use XML for our test results. This is a brief attempt to answer that.
Years ago, I used to sell cars. We would usually have a row of “problem” cars near the back. Sometimes the driver side door wouldn’t open. Other times, the engine wouldn’t start unless you held the steering wheel “just so”. Many of these cars ran just fine, so long as you could get into them and figure out how to start them. I like to think of these cars as the automotive equivalent of XML.
I’d be quite happy if I had a Euro for every broken XML generator or consumer out there. Many XML proponents gnash their teeth at this, claiming that XML is simple and that people shouldn’t have a problem with it. Well, here’s a little demo I want to do at a conference some time. I want to ask for a show of hands of how many people know XML. Then to encourage them, I’ll hold up my hand.
After looking at the sea of hands in front of me, waving gently back and forth, I’ll then hit them with my follow up question: “how many of you know what an unbound prefix is and how to fix it?” I’ll explain that if they hold their hand up, I might just call on them and ask them technical questions about this. I fully expect the sea of hands to evaporate to a puddle. People say they know XML but when you get down to the details, they don’t.
Real-world problems implementing XML are legion. One programmer complained to me about a contract won by a company which promised “secure, portable document exchange between government entities”. They were going to use XML as their portable format. Part of their security involved generating a digest for the XML document. The entire document. As a string. Have extra whitespace? Too bad. Are your attributes in the wrong order? Too bad.
Another example, one that caused me much grief, was implementing Yahoo!’s IDIF format for a Web site. Yahoo! purchased IDIF from Inktomi and I’m sure that the Inktomi salespeople must have been giving themselves high fives for selling one of those back lot cars. Except not only is it difficult to get into this car, it’s engine keeps misfiring. Let’s look at how Yahoo! describes IDIF:
The IDIF format consists of a stream of documents and associated metadata. IDIF is similar to XML, and follows XML rules of syntax.
Yes, IDIF is similar to XML. However, it does not follow XML rules of syntax. You basically wrap unescaped HTML documents in XML tags. You’re even forced to declare your document as XML with <?xml version=”1.0″?>, even though the document is clearly not XML. Heck, even though you are not allowed to escape your HTML in the document, you’re not required to provide valid XHTML (and you can’t use CDATA, either, though their specification is very vague about what’s allowed)! If you’re curious about how poorly this has been handled by Yahoo!, you can read my blog entries about my IDIF adventures. (Ironically, Yahoo!’s IDIF description does not properly escape all of their examples, so some examples never appear on the page).
I’ve encountered many broken XML parsers and generators, many people don’t know XML terribly well and even huge corporations get it wrong all the time, but what about TAP? Here’s a TAP document:
1..7 ok 1 - input file opened not ok 2 - first line of the input valid ok 3 - read the rest of the file not ok 4 - test protocol # Failed test 'test protocol' # in t/mytest.t line 13. # got: 'XML' # expected: 'Happiness' ok 5 # SKIP (Don't fork on Windows) ok 6 - you shall not pass! ok 7 - Gandalf wins. Game over.
Hmm, that’s pretty easy to read. And guess what? That’s machine readable, too. Since it’s mostly line-oriented, you don’t have to wait for a ‘well-formed’ TAP document before reading it. If it encounters a line it doesn’t recognize, it discards it (this lets us be forward compatible). And how easy is it to generate? Here’s a simplistic, but valid, TAP generator in Ruby:
#!/usr/bin/ruby -w
class Test
def initialize()
@count = 0
end
def ok( result, description = "[no description]" )
@count = @count + 1
print "not " unless result
puts "ok #{@count} - #{description}"
end
end
test = Test.new()
test.ok 1 == 1, "one is one"
test.ok 0 == 1
As your needs grow, so would a TAP producer, but really, it’s very, very easy to produce valid TAP. Heck, up until about five minutes ago, I didn’t know how to write a class in Ruby! (That says as much about how easy Ruby is as it does about TAP, I suppose). Here’s what the above generates:
ok 1 - one is one not ok 2 - [no description]
So, what would that look like in XML?
<?xml version="1.0" encoding="utf-8"?>
<TestResults xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<TestResults>
<result id="0001">
<passfail>Pass</passfail>
<descriptin>one is one</description>
</result>
<result id="0002">
<passfail>Fail</passfail>
<descriptin>[ no description ]</description>
</result>
</TestResults>
</TestResults>
(Adapted from: http://msdn.microsoft.com/msdnmag/issues/06/06/TestRun/)
Heck, why don’t you write a generator for that? Yeah, go ahead. I’m waiting. It’s the same results as what the Ruby program would generate, but far, far more verbose. And did you see the mistake in the XML? With TAP, you have to work hard to get an invalid test result line.
If you would like to know more about TAP and how you can use it, sign up for the mailing list (it’s language agnostic) or visit testanything.org. Both of those resources are relatively new. TAP has long been shepherded along by Perl programmers, but with implementations in PHP, C, JavaScript, PostgreSQL (PostgreSQL?) and many other languages, we’ve started to move away from our long-standing Perl-oriented resources.


I wrote a basic TAP implemention in GNU Forth for giggles last night :)
Andy: I can believe it! TAP is ridiculously easy to implement. That's part of the reason why it's available in so many languages. So far I've heard of implementations in:
Outside of Perl, it's most popular in the PHP world, though I've seen it used heavily in C and C++.
>> "how many of you know what an unbound prefix is and how to fix it?"
Isn't that a bit of a loaded question? I mean, as soon as someone states "an unbound prefix is a prefix without a matching namespace declaration and to fix it you simply add an xmlns:prefix="uri:pick-a-namespace-any-namespace" within proper document context in regards to the first use of the prefix" I can only guess what your response is going to be,
Q: "So what is the proper document context?"
A: "prefix and namespace declared on or before the first prefixed element in the document."
Q: "Okay, so then what about the processor? What happens if the namespace of the prefix referenced in the XML document doesn't match the prefix of the same referenced namespace inside of the processor?"
A: "The namespace is what matters, and is what is used to bind elements to their matching processing instruction inside of the processor."
Q: "So what if there are no matching namespaces inside of the processor?"
A: "Depends on the processor. Anything from nothing to the text nodes being output from the elements that didn't match any of the rules/instructions and therefore defaulted to built in rules/instructions that output the text nodes contained inside of the element"
Q: "So then you would get the incorrect and/or unexpected result?"
A: "Yep. Just like with TAP if you haven't spent the time to understand what it is you are doing in the first place."
In fairness to XML, it's often better used for documents rather than data. And namespaces trip up everybody, which is really scary. In a recent training course for an XML database, the namespaces section consistently tripped up everybody in the class, even those with 5+ years of continuous XML experience. Quite depressing.
Anyway, regarding digests of XML. That's what Canonical XML was invented for. My XML::Genx module will output in this format.
Also,
>> And did you see the mistake in the XML?
For those who understand how an XML writer actually works, they would understand that to get,
<descriptin>one is one</description>
as the misspelled output would not be possible unless you had a faulty XML writer, as the opening and closing element are written from the same memory location which has been held in that location until such time as the signal to close the tag has been given. And given that there are literally hundreds of XML writers out there that have been tested within an inch of their life to ensure proper conformance, to get the above output would mean,
1) You chose to write your own XML writer instead of using the XML writer that comes as standard issue as part of each and every respected language and/or platform on the planet.
2) You chose to hack together a half a$$ solution instead of taking the time to think through the process of writing a proper XML writer.
3) Either of which, of course, would showcase that you have either no clue what you're doing, or could care less about spending the time to write software that actually works, and if it's the latter of the two, then your problem has nothing to do with XML, and *everything* to do with having chosen the wrong career path.
Maybe its just me, but shouldn't the focus be less about "Test Anything" and more about "Test Always"? If it's not, why not? It seems to me that if you place the core focus on 'testing anything' that comes your way as opposed to testing continuously as you develop your code, the result tends to be that of "Test Anything. Always Testing." instead of "Test Always. Spend the Rest of Your Time Doing Anything You Want."
M David Peterson: the point is that TAP is simple. Even if a line of TAP is invalid, it doesn't ruin the parsing of the entire document. There are plenty of broken XML parsers and generators out there but TAP is so ridiculously easy that just a minute or two of reading through the spec gives you a grasp of just about everything. The same cannot be said for XML. The problem space for test results is relatively restricted compared to the problem spaces that XML is suitable for and having the full power of XML necessarily introduces the complexity of XML.
TAP is gaining popularity in part because it's not XML. It may not be appropriate for your needs and that's fine, but for those who find it suitable, they're quite happy to have something so easy to use.
Don't forget TAP generators written in Perl 6 as well as PIR (Parrot's native programming language). I'm especially proud of the latter because it allows me to write the tests for the Pheme programming language (built on Parrot) in Pheme itself.
@Ovid,
Fair enough. These are some good points. Thanks for taking the time to bring them up!
@chromatic,
>> PIR (Parrot's native programming language).
Seems I need to do some research. First I've heard of this, though that's not surprising given that I have about as much experience with Perl as I do Fortan. ;-)
@Ovid,
One thing I have been dieing to ask you ever since the first time I saw the pic you have on your profile: What is the object in the left portion of the photo?
M. David Peterson. You know, I should create an "Ovid FAQ" and put that question near the top of the list.
A friend of mine is a professional photographer (link not safe for work) and he invited me over for a photo shoot (not an "adult" one, I should add). While we were there, he started playing around with other ideas and said "here, hold this". Neither of us are sure what it is, but we think it's part of a TV set, from the cathode ray tube. You can see a slightly larger version at my Perlmonks page.
@Ovid,
>> You know, I should create an "Ovid FAQ" and put that question near the top of the list.
That would be fantastic! Thanks! ;-)
re: You friends site: One word: Ouch!
re: The bigger pic: thanks, that helps!
s/you/your
Uh-oh, here it comes...
Everybody takes a drink!
I wonder if this is any indication of the number of people affected by this corner-case.
One example, and "another example" which isn't even XML? That hardly supports a claim of "legion".
I bet the contractor could've found a way to mis-implement TAP just as poorly as XML. Don't blame the syntax for the implementation.
I'm not sure what you are getting at in this article. Is TAP a good format for narrative documents? Why is XML not suitable for test results? Would some other standard structured format, like JSON, be better?
The problem is everyone has a favorite pet format that is perfectly suited to their way of thinking, which is sometimes not sufficiently flexible to accommodate an XML approach. This pattern will always lead to dozens of independent libraries for nearly identical problem spaces (slightly differing only in how a small group perceives the problem space) with incomplete platform or environment support. For example, how many .NET implementations does this language-agnostic format include? None? That's a pretty big hole. Java? For a ten-year-old format, implementations are pretty narrow.
How well will this format scale to unanticipated functionality in the future? How extensible is it?
Are you kidding? I was done at "Heck". I'd still be learning your niche format! Dammit, I'm getting tired of having to learn pet formats!
If space is such a concern, compress the XML.
Yes, because consistently working in a common format has attuned me to it. If I had to learn a dozen niche formats, I wouldn't be any good at seeing errors in any of them.
@Brianary, I thought the point of XML was to create niche formats. You certainly don't get semantics for free.
Everybody takes a drink!
Why are you being rude to a complete stranger? But then, that's the curse of the Web, eh?
As for your comments, I've learned a long time ago that there's a strange problem with discussing issues. I have to include examples of the issue to make people come to grips with it, but when I do that, people focus so much on one or two examples that they seem to miss how they relate to the larger issue. I suspect this has something to do with relatively shorter attention spans people seem to have today. They focus on a paragraph and not the point. When you wondered "if [Ovid's example] is any indication of the number of people affected by [the unbound prefix] corner-case", it's clear that I didn't get my point across. There are plenty of other questions I could have asked and gotten similar results. Please don't focus on the unbound prefix example.
One example, and "another example" which isn't even XML? That hardly supports a claim of "legion".
This is a blog entry, not a court case. There's no way I could have provided an exhaustive list and it certainly wouldn't be appropriate to do so here. As for the second example, if yet another vendor supplies yet another pseudo-XML format (I've hit plenty and I'm sure others have too) and you don't see how this relates, we'll just have to agree to disagree.
I bet the contractor could've found a way to mis-implement TAP just as poorly as XML.
I suspect that you didn't read the other responses. You can learn the basics of TAP in a couple of minutes. It's flexibility and tolerance make it ridiculously easy to implement.
Don't blame the syntax for the implementation.
Of course I will. With so many people implementing XML incorrectly, it's fair to ask why. Simply saying "they didn't read the spec!" ignores the question "why?" If XML is so simple, why do so many people get it wrong? TAP is ridiculously simple and people get it right.
OK, I'll stop addressing your points one by one and I'm sure others are tired of it, but I can't ignore this one:
If space is such a concern, compress the XML.
Look at the XML snippet I wrote and the TAP snippet which I had before that. They represent the same data. Verbosity affects legibility. There's no way around that. Compression has nothing to do with that.
On the off chance that you feel I have stopped addressing individual points because I can't, please feel free to email me. The domain is cpan.org and I'm "ovid". There's a lot I didn't cover because this is just a blog entry, not an indictment.
Oops. In case it's not clear, my previous response was @Brianary.
TAP looks nice, but what gives it better juice than YAML? (I'll admit I don't know YAML's rules very well and that from what I *have* seen, it can get quite complex).
Calm down, calm down. The problem with rudeness is that it is extremely locale-dependent, and therefore utterly subjective. Imagine a smiley in the first reply.
My problem with that opening is that the "I have not come to praise Caesar..." bit comes off as offensively condescending sometimes, and I've heard it about XML in particular so often, I just had to call it out.
Perhaps I've misinterpreted this post, but it appears to be a piece of persuasive writing, a logical argument. If this is not the case, just ignore my replies completely.
I guess you may have been starting from the premise that "XML is bad in many cases", rather than attempting to establish that. If that is a premise, and not what you were trying to show, then everything follows fine. It isn't a premise I agree with, but maybe I'm not the intended audience, either.
Otherwise, don't expect to convince anyone based on one example and one non sequitor. Supporting links to previous discussions (particularly any that helped to form your opinion on the subject) would make the post more factual, and less like flamebait.
So many? Really?
Clearly we code in very different circles. I just haven't seen that many poor implementations of XML. Most of the code I see uses the libraries that come with whatever standard language library for building and parsing, and typical implementations don't even venture into namespaces.
In any case, you didn't make even a cursory enumeration of problem XML implementations, so I guess this is also a premise. Of course, if you just assume all the hard topics, and ignore anyone that suggests re-examination of them, I just don't see the point of posting or discussion at all.
Granted, but this only applies if you are editing XML in a low-features text editor like MS Notepad or gedit that can't help much. Add syntax highlighting, and I'd wager the effect on readability is pretty negligible. Add an XML editor or specialized UI, and the issue is entirely moot anyway.
Hmm... I hit Post rather than Preview, so that last reply is probably riddled with errors.
TAP looks so clean and easy compared to XML but is limited to 3 fields per line. You could make TAP a bit more flexible by using delimiters other than spaces. Change
ok 1 input file opened
ok 1 - input file opened
to something like
Limiting an entryy to one line is a pig of a restriction. You could wrap all the values in one element like so
ok 1
input file opened
That 1..7 looks weird. It would be easier to wrap the result set in an element such as .
Yeah, that pretty much fixes up TAP to be flexible. Maybe just an extra element at the start to indicate the version of TAP and maybe some pointers to the definition of TAP so people know where to look when they find a TAP file.
fuckin schools blocking the word proxy and all the other site behind it plz help
There are a multitude of line-oriented formats; CSV is of course the major one: you can even get subfields by using different delimiters.
The trouble with line-oriented formats is that text editors often add newlines themselves, when auto-wrapping. You can corrupt the document just by opening it in an editor! And these are very difficult to detect, if you have full lines. These issues and their trade-offs were well-known and discussed in the 70s and early 80s when GML and SGML was developed: the terseness and formatting of line+tab oriented formats is not new. (In fact, SGML allowed line-oriented sub-formats to be declared; you could embed TAP or troff inside angle-bracket containers.)