I wrote a recent article the other day on how Google has decided NOT to use XML for a recent project they open sourced. I received a LOT of very opinionated responses to that post. Unfortunately every one was from a complete MORON.

Ok - I admit it - that last statement was flamebait. But I could not help myself - the comments were completely off the mark (up). Let me make it clear: I KNOW XML is not the best format for everything. For example - I would not try to serialize video with characters and tags. But I might ENCLOSE it.

Did anyone actually L@@K at the the .proto format? They added CONTEXTUAL metadata to describe if data is required, optional, or enumerations. Noise to signal entropy INCREASES for edge cases as you ADD specific placement requirements. Here it is in simple terms: One commenter posted how you can encoding a single digit number in three bytes and using XML would take much more. He is correct - except for EXCEPTIONS. I can code a single digit number in ONE byte, but what happens when I want to represent a very long number? Another example: I know CSV files can contain more information for the same file size because they repeat the headers only once, but _ONLY_ for tabular data. XML is better BECAUSE it is more flexible.

Here is an example from Google:
message Person
{
required int32 id = 1;
required string name = 2;
optional string email = 3;
}

Am I the ONLY one here that saw this as a JSON construct? Take another look - the format actually has MORE delimiters than JSON. Why couldn’t they use a structured format that actually ALREADY has some libraries for it?

Forget all that. THE BIG issue? This is NOT the format that is going OVER THE WIRE! Google says that ALL over their documentation, although all of the comments come from people that seemed to miss that. What happens is that they take the proto file and COMPILE it to create a schema and then send the data. Does everyone here think XML automatically means the DOM or XSLT? I have some great tools (from Microsoft!) for taking XML documents, creating a schema, and serializing JUST the data with minimal metainfo that has the level of data density.

Let me summarize:
1) We ALL agree XML is not the best interchange for EVERYTHING
2) Google COULD have used XML if they wanted ( or JSON looks like a better fit)
3) Using an established format would NOT have effected the actual TRANSFER
4) I admit that Google knows their stuff and has probably analyzed this that XML for the TRANSFORMATION may have been too fat. HOWEVER - they optimized, and in so doing specifically limited expandability or at least reuse.
5) Building a library to do a SPECIFIC task well is OK, but the crux of my argument is IF they spent this time creating better XML or JSON libraries, then we ALL benefit in projects that do not use this protocol.

Ok- BEFORE you Google Bomb me into next year, I apologize for calling SOME posters names. If you wish to dissect my treatise, please review the actual spec and address my specific points above. C’mon - this isn’t slashdot - I welcome any REASONED argument.