Update: Simon Phipps has stated that my assertion that Novell is the largest contributor of source code to OpenOffice.org is incorrect, and in fact it is Sun Microsystems who is the largest contributor. My data says otherwise, but I also have a lot of trust in as well as respect for Simon, so I have to assume that my data is incorrect.
I don’t think this changes my assertion that Miguel is someone in whom can be seen as an authority figure when it comes to the technical debate between ODF and EOOXML, as regardless of whether Novell is, in fact, the largest contributor, they are a significant contributor, and therefore his understanding of the technical issues involved are significant. None-the-less, if my assertion is incorrect (and, again, I can only assume that it is) then please take this into consideration in your overall analysis of both this post as well as the follow-up comments.
Thanks for helping to clarify the facts, Simon!
Update: bryan presents a refreshing perspective than what seems to be the standard “I hate Microsoft!” attitude when it comes to why they feel EOOXML is a bad thing.
Basically if someone asked me to work with OOXML for extracting data I would say, sure but it would be cheaper and easier to use Microsoft’s APIs to work with office data, or to convert the document to ODF and extract the data that way.
This is really my only dislike of ooxml. I don’t think it qualifies as FUD, it is just my experience of how it is to work with these technologies.
In follow-up, Miguel de Icaza provides a solution to the stated problem at hand,
This is an article about LINQ and how to extract Word ML document using it:
Folks have showcased LINQ-like technologies for Ruby and Python, so it cant be that hard to parse.
What I truly admire and appreciate about this exchange is pretty straight forward,
A real-world problem in regards to the usage of EOOXML, and a real-world solution provided in follow-up, or in other words, just like what tends to take place on a daily basis in both the open and closed source camps (though the the community aspect and overall openness, as should be obvious, is much more prevalent, generally speaking, in the OSS camps), two hackers find ways to present real-world problems and real-world solutions to these problems.
This is the way it *NEEDS* to be folks. The FUD, anti-EOOXML smear campaigns accomplish *NOTHING*, where as the exchange that took place below accomplished exactly what needed to be accomplished… Find where the problems exist and then fix them.
Thanks to both of you for providing a picture perfect example of how things both could and should be working as we move forward into the next generation of open xml document formats!
So I just finished up reading an interesting post from Miguel de Icaza regarding ODF vs. EOOXML, and felt that it was really quite important to share with the rest of you all,
… I think that the group is not only shooting themselves in the foot, they are shooting all of our collective open source feet.
Interesting lead in, and something that I can assure you lives up to the promise of showcasing why the ODF vs. EOOXML battle field is doing more harm than it could ever do good, but before I move on, there’s something I’ve been wanting to get off my chest…
Thanks, people. You really mean a lot to me as well.
Okay, so now that we have the “goo goo eyed” love speech out of the way, I figured “you know, with as much love you get thrown your direction, you really should consider spreading some of it around. In other words, “Don’t be so greedy, damn it! Share the love, baby, *SHARE THE LOVE*!”
So in the spirit of sharing, I’ve created for each of you this *LIMITED EDITION* collectors “M. David [HEART]’s You” image that can be freely printed on a t-shirt, or quite possibly a button (or somethin’) that you can tack to your lapel and proudly wear to work each day to show everyone how much you are truly loved.
You’re Welcome! :D
Okay, now that the love has been properly shared, back to Miguel’s post.
As many of you I am sure know, Miguel, a Linux-community luminary, is Vice President of Developer Platforms at Novell. Again, as many of you will know, Novell is the top contributor of source code to the OpenOffice.org project. So to suggest that he kinda might know a thing or two about what’s really going on with the ODF vs. EOOXML debate would be somewhat of an understatement.
A couple of quick sound bites at which point I would encourage you all in whom are interested in understanding what is *REALLY* going on (as opposed to the brainwashing FUD thats being hucked around as of late) to visit the above link to learn more,
Unlike the XML Schema vs Relax NG discussion where the advantages of one system over the other are very clear, the quality differences between the OOXML and ODF markup are hard to articulate.
The high-level comparisons so far have focused on tiny details (encoding, model used for the XML). There is nothing fundamentally better or worse in those standards like there is between XML Schema and Relax NG.
A common objection to OOXML is that the specification is “too big”, that 6,000 pages is a bit too much for a specification and that this would prevent third parties from implementing support for the standard.
Considering that for years we, the open source community, have been trying to extract as much information about protocols and file formats from Microsoft, this is actually a good thing.
Depending on how you count, ODF has 4 to 10 pages devoted to it. There is no way you could build a spreadsheet software based on this specification.
To build a spreadsheet program based on ODF you would have to resort to an existing implementation source code (OpenOffice.org, Gnumeric) or you would have to resort to Microsoft’s public documentation or ironically to the OOXML specification.
The ODF Alliance in their OOXML Fact Sheet conveniently ignores this issue.
Some of the objections over OOXML are based around the fact that it does not use existing ISO standards for some of the bits in it. They list 7 ISO standards that OOXML does not use: 8601 dates and times; 639 names and languages; 8632 computer graphics and metafiles; 10118-3 cryptography as well as a handful of W3C standards.
By comparison, ODF only references three ISO standards: Relax NG (OOXML also references this one), 639 (language codes) and 3166 (country codes).
Not only it is demanded that OOXML abide by more standards than ISO’s own ODF does, but also that the format used for metafiles from 1999 be used. It seems like it would prevent some nice features developed in the last 8 years for no other reason than “there was a standard for it”.
If Microsoft had produced 760 pages (the size of ODF) as the documentation for the “.doc”, “.xls” and “.ppt” that lacked for example the formula specification, wouldn’t people justly complain that the specification was incomplete and was useless?
Okay, so this should at least be enough to peak your interest. As such, I would encourage each and every one of you to get *THE FACTS* as they truly are, from someone in whom, like Rick Jelliffe, truly understands what the facts are, as opposed to what the EOOXML smear campaigners would like for you to believe.
And, as always, thanks for reading. This stuff is important!