ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


What Is OpenDocument

by Sam Hiser, coauthor of Exploring the JDS Linux Desktop
07/27/2006
OpenDocument
The OpenDocument Format (ODF) is an emerging file format standard for electronic office documents. Representing a triumph of common sense over the methods conceived before the rise of the Internet, ODF's goals are both exciting and controversial. Early adopters of the format include state and municipal governments in some near- and far-flung places, and this makes the format's progress a thing to watch. Yet innovation theory tells us there are some hurdles we all must overcome before ODF becomes a regular topic of conversation at the ballpark. Those in the know, however, recognize that we're in about the second inning of a barn-burner. So, grab a hot dog and a beer, and settle in for a classic.

In This Article:

  1. ODF: What's It Made Of?
  2. Find It Here
  3. File Formats in General
  4. OpenDocument Format Is a Specification
  5. ODF Is an Open Standard
  6. ODF Is Not Open Source, Nor Is It Free Software
  7. ODF Is Approved for Use in Free Software
  8. Software Applications Offering ODF
  9. Adoption of ODF
  10. Microsoft's Lunge at an XML File Format
  11. The DADA Theater of Lobbying Public-Sector Customers
  12. Standardizing Document Formats Is a Natural Progression
  13. Conclusion

ODF: What's It Made Of?

XML is the "connective tissue"1 that binds different IT systems together for delivering information seamlessly anywhere across the internet. In a world where IT systems traditionally do very poorly working together or "talking" to each other, this kind of description gets a lot of attention. That's why, when the XML standard emerged in the late 1990s, people who often worked with documents recognized it as an interesting solution to a large number of document and data problems.

Enter, then, OpenDocument Format, the open standard implementation of XML for office documents. An open standard recipe for organizing document data is very different from what we're used to. Until now, the organizing principles for our document data have been hidden from public view, because they were developed by a private enterprise and used for competitive advantage. Given the obscurity of document formats and of technical standards work, it's easy to miss the importance of an XML-based open document format standard.

With the OpenDocument Format, we're talking about a very different way of doing things. Documents become the center of attention, not applications. While this has large benefits for the way information is generated, connected, accessed, and archived, it ruffles the feathers of people and businesses that are committed to the established, if inferior, ways.

How could the leading application software vendor for documents not be offended by the aspirations of OpenDocument Format? It disrupts Microsoft's influence on its customers in large and important ways--both direct and indirect. If OpenDocument Format does not launch the most important worldwide software standards battle, then it will at least provide the very best theater for the citizenry to chide, heckle, and throw tomatoes on the stage--as the established software vendor cajoles its old and new customers back into deep dependence on a single vendor.

Find It Here

OpenDocument Format (ODF) is a file format for office documents that is available to users of a growing list of software applications. IBM Workplace Managed Client, OpenOffice.org 2.0 and its sibling commercial office suite, and StarOffice 8 are the most mature tools available today to users seeking to create documents in ODF.

ODF offers an open alternative to the formats used by all of the existing Microsoft Office application versions for text, spreadsheet, presentation, and other kinds of documents. The most familiar and commonly used file extensions are Microsoft's .doc, .xls, and .ppt. OpenDocument's main file extensions are .odt (for text documents), .ods (for spreadsheets), and .odp (for presentations). These are analogous to the Microsoft extensions and will be more commonly recognized as more people and organizations adopt OpenDocument-ready software.

ODF is an ISO standard. The International Standards Organization (ISO) ratified ODF in May 2005. ISO ratification is important for any software, because it permits the software to enter the menu of approved products for procurement by many--if not most--municipal, state, and national government IT departments around the world. The successful ISO ratification took place approximately one year ago, which is why many government IT departments are now announcing ODF adoption plans. (It takes a long time for governments to plan for and implement change.)

File Formats in General

Think about it. If you were trying to design a common way to identify the location and organization of letter and number characters in varying fonts, sizes, and styles on a page for graphical display onscreen or for printout, you would be developing what is considered in computing terms a markup language. We already have several standards for markup: one is HTML (the HyperText Markup Language standard of the W3C) for web pages. HTML is being replaced by XHTML (eXtensible HyperTest Markup Language) a more explicit, stricter (less lenient, some would say) markup language for electronic documents. Now, XML (eXtensible Markup Language) is the markup language for creating special markup languages. XML evolved as a standard with the primary purpose of facilitating data sharing across disparate systems. This is why XML was chosen as the basis for the new office suite file formats. It just makes good sense to have a single set of rules to which everyone refers when designing software applications and systems that handle documents.

Office suite applications, as always, need a file format that is designed to organize the data when it moves away from the application. This is so people with different machines in different places can open and edit the data in a file. This has as much to do with computing and data transmission limitations since the 1970s as with anything else, because the software application is quite large (in terms of the amount of its code) when compared with the small amount of code in a business letter, a facsimile cover page, or even the draft of this article. Data files can easily move around because they are small relative to transmission limits (bandwidth), while applications aren't moved because of their large size. In the distant future (perhaps over 10 years from now), when limitations evolve--when, for instance, the software application may become small enough to travel with the data--the need to separate file format from application may not even exist. (Some examples of application data traveling with the data exist today, but the office suite is not among them.)

In the meantime, the design intention of the standard document file format is important. The formats most people still use today are designed and owned by a single company, Microsoft (the .doc, .xls, and .ppt formats noted above). These are not format standards in the appropriate, full sense; they are widely used, but that is not enough to call them a standard. Having an file format standard (a single standard, because having more than one is an oxymoron) is important, as it's optimal to have multiple software vendors and projects competing on price and features to provide office suite products designed from that standard. Interoperability between applications is also optimized when there is a common open standard format around which all relevant applications can be designed.

The ideal for a standard file format is one that is open, accessible to everyone, and that can be implemented in any kind of software--whether of the commercial type, or of the Free Software or open source type. The OpenDocument Format is our best chance at implementing a successful open standard file format for documents in the office suite context. It is an ISO standard and is already the target of IT policy or legislation in Massachusetts, Minnesota, the Bristol City Council (England), and at the national and municipal government levels in Belgium, Denmark, France, Australia, South Korea, and Malaysia, among others. Furthermore, there are unannounced publicly and privately traded companies that are pursuing vendor neutrality and modularity in their IT systems and expressing interest in ODF for their computer systems as the first step to an ideal end-state based on open standards.

OpenDocument Format Is a Specification

The OpenDocument Format itself is a technical specification. That is, it is essentially a document--a piece of paper. The specification is the complete set of instructions, the recipe, that any software developers or entities can freely and openly use to incorporate ODF into their software, including but not limited to office suite applications such as those mentioned above.

The OpenDocument Format specification document resides at the Organization for the Advancement of Structured Information Standards (OASIS) website. The OASIS OpenDocument Format for Office Applications Technical Committee (OASIS ODF TC) is responsible for making changes to the format, keeping it technically up-to-date, and permitting it to evolve with innovations in document technology that will certainly occur in the future.

ODF Is an Open Standard

As an example of an open software standard, OpenDocument adheres to the following criteria in its modes of development and use:

Any format that does not adhere to all of these criteria cannot be considered an open standard.

ODF Is Not Open Source, Nor Is It Free Software

But it is open and free. The OpenDocument Format is a specification; an OpenDocument file does not become software, per se, or take form in a file until some software application creates or changes it. OpenDocument Format, being an open standard, can be implemented in open source and free software applications, as well as commercial applications.

Therefore, the license under which the OpenDocument-ready application is distributed does not impact the license of, access to, or redistribution of OpenDocument, the specification. (Although it does impact the access to specific OpenDocument-ready applications.)

It suffices to say that there is a mature, downloadable open source office suite that offers the OpenDocument format as its native default (OpenOffice.org 2.0); and there is a mature commercial application that offers OpenDocument, too (StarOffice 8). Accordingly, software licenses, business models, or software development models do not restrict access to organizations seeking to use the OpenDocument Format.

ODF Is Approved for Use in Free Software

Recently, the Software Freedom Law Center (SFLC) issued a legal opinion that ODF is free of legal encumbrances that would prevent its use in free and open source software, as distributed under licenses authored by Apache and the Free Software Foundation--including the GNU/GPL and Apache software licenses.

In no way does this opinion indicate that there is any question about ODF's usability with commercial software. ODF is therefore free to be used and deployed in open source, free software and commercial software, with the same rights available to everyone developing software that interoperates with the OpenDocument Format.

Software Applications Offering ODF

Sun Microsystems engineers Daniel Vogelheim and Michael Breuer gave us the reference implementations of OpenDocument in the early versions of OpenOffice and StarOffice beginning in 1999, so the specification may contain some technologies to which Sun Microsystems retains the rights (none have been specifically declared). Yet Sun offers a perpetual and reciprocal royalty-free license for the OpenDocument specification (just in case there is a question).

Presently, the "Big Four" mature applications that offer ODF as a default file-format option include:

Other applications offer some (but incomplete) ODF support today. They include the web-based word processors Writely (from Google), Zoho Writer, and ajaxWrite, as well as the Mac version of OpenOffice.org, called NeoOffice. You can track the progress of these products and find new additions to the category on Wikipedia under the OpenDocument software heading.

Additionally, software from document collaboration companies--including Alfresco--are starting to show up on the lists of entities associated with ODF.

The market presence of multiple new office software products based on the OpenDocument Format is a sufficient indication that growing confidence in ODF, and the belief that an open format is capable of competing against the Microsoft formats, are already driving more choice and lower prices into the office-suite software market.

Adoption of ODF

At the time of this article's publication, the number of entities worldwide that have openly expressed support for ODF through the ODF Alliance (the Washington, D.C.-based principal lobbying group) is fluid and growing. The ODF Alliance was formed in the autumn of 2005 and, as IBM's Open Source and Open Standards VP, Bob Sutor said, "Since the news broke of [Google] joining, the ODF Alliance membership went up from 20 to about 260." Contrast this with their modest aspirations back when ODF was beginning to gain momentum: when the Alliance was first formed, Sutor said it was considered positive that there were as many as 10 founding members.

If the ODF Alliance is any reflection of interest in vendor-neutral file formats, then adoptions will be deep and wide in coming years. The complete and growing list of Alliance members is here. And, below, you'll find a gloss of the earliest movers at the municipal or state government level.

The Commonwealth of Massachusetts

Instead of legislating for an open office document standard, Massachusetts' Executive Department Information Technology Division ("ITD") took two years to formulate a responsible and thoughtful policy for defining many open software standards and dates by which they should be implemented. This policy initiative--the Enterprise Technical Reference Model version 3.5 (PDF) ("ETRM 3.5")--went public at the end of August 2005 and was finalized in late September, after public comments were taken into account.

The Commonwealth's ITD has been making progress since ETRM 3.5 became final on a) a Pilot Program to test ODF-ready software applications, b) developing a professional cost model, and c) managing relationships with Massachusetts accessibility groups who would be affected by alterations in the way state agencies work with office documents. The CIO of ITD, Louis Gutierrez, recently affirmed the go-live date (January 1, 2007) for the implementation ODF across Executive Department agencies--"Mass. holding tight to OpenDocument" (Martin Lamonica - c|net, July 7, 2006). With many CIOs around the world looking in on the progress in the Commonwealth, the affirmation has been an external confidence boost, while internally, political opponents of Governor Mitt Romney and the Microsoft lobby remain active and continue to try different strategies to dampen ITD's pursuit of ODF.

The State of Minnesota

Minnesota has legislation pending that would enforce policies toward standardization around ODF in state government offices. Characteristic of legislation initiatives, this could take another year to gain traction, if it is ultimately successful at all.

The State of California

Like the U.S. Department of Defense, the State of California is no stranger to open source and free software, which are being generally implemented throughout California state agencies, most commonly on servers in the form of Linux or the BSDs. The state is among the most curious about ODF and how the document format can provide a stepping stone for more modular systems and eventually, for desktop software neutrality.

Bristol City Council (England)

The Bristol City Council is well under way with an implementation of StarOffice across its approximately 5,500 Windows desktop machines. This makes Bristol an ODF location by default because StarOffice is among the principal ODF-ready office suites.

Belgium

Belgium's Council of Ministers recently announced a policy to go with ODF as the standard for exchanging documents within the government. Belgium's federal services must use ODF when exchanging documents, though other formats will still be allowed for internal use.

Denmark

Danish Member of Parliament Morten Helveg Petersen introduced a draft of a motion in the Danish Parliament supporting the use of open standards in Danish government. Later, the Danish Ministry of Science, Technology, and Innovation announced in May 2005 the commencement of a six-month trial period for evaluating ODF-ready software. And from September 1, 2006, online and written publications will be available in the OpenDocument Format. These developments seem to have been influenced by John Goetze's "Special Report" (here, in Danish), a December 2005 analysis of the economic effects of open standards. The report concludes that while it's difficult to estimate the precise costs and benefits of implementing open standards, there are strong reasons for making open standards compulsory in government where interoperability is at stake.

France

The French National Police (Gendarmerie) adopted OpenOffice last year as part of the transition of most of its 100,000 PCs to open source software. Other agencies in France are reported to be planning migrations to OpenDocument-ready software, too. Also, the country's General Repository for Interoperability (RGI) is recommending ODF as the standard format for office documents and seeks to make it mandatory for exchanging documents in ODF.

Microsoft's Lunge at an XML File Format

Office Open XML format is the name of Microsoft's next-generation file formats for their office suite application, Office 2007, which will be released some time next year. These formats do implement the XML standard; however, "open" they are not.

The name is even curious. Microsoft chose this name to create confusion in the market between the well-established OpenOffice project and product (which was the first implementation of the OpenDocument format, mentioned above) and Microsoft's late-coming formats. In fact, Microsoft lobbied ECMA to request that OASIS change the name from Open Office Format TC, which is when ODF became OpenDocument. Microsoft followed suit, coming right in and naming their new format Office Open XML to maximize the public confusion about the competing implementations of the XML standard.

Instead of adopting ODF (which it may do at any time), Microsoft has elected to create its own new document formats to leverage the benefits of XML inside office documents, while maintaining leverage over the interoperability of customers' IT systems. Their new file formats, called Office Open XML, are being designed under the auspices of the ECMA consortium in Europe, and will be fast-tracked to ISO next year (although ratification is not guaranteed). The ECMA specification for Office Open XML reveals that Microsoft intends to tie functions within its Office 2007 office suite, functions within its new Vista operating system, and functions within its Exchange Server and SharePoint Portal Server to its new, XML-ready document file format. Continuing to tie the new XML file format not only defeats the design objectives of XML, but clearly perpetuates a familiar control by the company over customers' document data and software upgrade cycles.

Developers and observers who are technically inclined (as well as sensitive to both overt and embedded political and marketing messaging) might enjoy following Microsoft's development process for Office Open XML from Brian Jones' blog.

The DADA Theater of Lobbying Public-Sector Customers

Lobbying by Microsoft in ODF hotspots such as Massachusetts has reached Kafka-esque proportions of inanity and self-contradiction--but it has been fun to watch.

Microsoft's Alan Yates' argument for dual standards from the Massachusetts Senate Reading Room (December 14, 2005) was a stumbling oxymoron:

"What I'm really going to be talking about is Massachusetts actually opening up to more choice and more competition than the current policy has...I think that's the fundamental decision that's before us. Can Massachusetts open up to more choice, additional standards, in order to enable greater value over a period of time? And by doing that, by enabling more choice over a period of time, you avoid the industry warfare that tends to jerk governments around from one month to the next month, to one debate to the next debate to the next debate." (Microsoft's Yates' to MA: How About 2 Standards?--Transcript | GROKLAW)

Even if one respects Microsoft's right to compete in selling products in the software application markets, one would still object to the arrogant way they recast agreed-upon meanings and reframe discussions. Yates said, "I'm...talking about...Massachusetts opening up to more choice and more competition...more choice, additional standards..." Alan Yates is talking in a vacuum because the Commonwealth--through the ETRM 3.5 policy--has asserted its position as being about more choice and more competition in application software markets. As I've said before, choice and competition in the context of standards doesn't make sense, at best; it is meaningless drivel, at worst.

The difference is not nuanced. Yates here is co-opting the terms "choice" and "competition" to his own meanings, quite off-topic. This is intended to confuse the audience. Marcel Duchamp is applauding in his grave, because the audience is confused. Microsoft's influence is so penetrating that many people believe anything they say, even things that are laughably false. But to the ODF cognoscenti, this kind of thing is extremely cynical at face value, so knowingly manipulative.

If Microsoft were sincerely interested in competing and playing fair, they would respond to the customer's request: they would insert ODF capabilities in their Office software. If they didn't feel they could slip in a bogus "open" standard file format through sheer bullying brute force, they would not take this absurdist tack. Their petulant response to ODF is telling. Ultimately, it's pure entertainment.

When Office 2007 is released next year, the market will indicate its appetite for an Office Open XML file format, which is not open by the accepted definition of open standard software. Customers comfortable with another decade's commitment to a Microsoft-only infrastructure will not be wary of a commitment to the Office Open XML file format. But those sensitive to single-vendor control of their data will hold out for ODF and its openly interoperable solutions.

Standardizing Document Formats is a Natural Progression

As technologies mature, shared facilities always become standardized. This has been true of common units of measure, construction materials (screws, pipes, fasteners, wood and metal subcomponents such as the two-by-four, among others), as well as Internet protocols (TCP/IP). There is every reason to expect that document format standardization around an open specification (exemplified by ODF) should drive competition into document tools markets and drive innovation toward all the things we can do with documents and the data within. If opening and sharing the Internet protocol, TCP/IP, gave us email, instant messaging, and web services, as well as commercial phenomena such as Amazon.com, Google, eBay, Craigslist, and other imaginative new ways of communicating and finding others with common interests, then standardizing and ensuring the wide dissemination of an open file format should yield similarly exciting tools, processes, and ways of accessing information in and across the sphere of documents.

OpenDocument was created to overcome many of the problems of proprietary document formats. Yet it is not solely a competitive missile directed at the established format owner, Microsoft. This is because the implications of an open file format for documents generate more value for the global ICT infrastructure at all levels than could ever be represented in a single company. And the origins and impact of the OpenDocument Format are far beyond the commercial sphere.

Information or data created by users and organizations and stored in office documents belongs to the creators of that data. Yet, when the most common document formats are proprietary (or not open), users lose control of their data through dependency upon the software company or entity that controls the data format. When the controlling entity makes changes to its format, this forces software reacquisition upon users, which can be expensive as well as unnecessary. The OpenDocument Format, by being openly developed and having no impediments to its usage or access, provides an excellent solution to this problem of control by offering an open standard data format for use in all kinds of software--free as well as commercial.

OpenDocument cannot avoid being defined today by the established scenarios it controverts. However, it is important to be mindful that OpenDocument's implications extend to doing things with documents and information that have not been invented (nor imagined) yet. Nor could OpenDocument's potential ever be successfully defined in the popular imagination within the gross limits of the model of personal computing and connectivity we know today.

Conclusion

At one time the main interface for working with information in documents was the software application (an office suite or a text editor of some kind); now, the main interface is the document itself, and it won't matter what application you use. The OpenDocument Format is bringing the world from an application-centric model of computing to a document-centric model of computing. This means that creating new business processes will be as easy as typing a memo on a PC or working with a small connected device.

Application-centrism isn't necessarily bad, unless a single company owns and hides the software application's code and all the data created by it. Where PC users have been accustomed for years to the constipation of version madness--the frustration from work stoppages caused by the frequent incompatibility of documents with software applications of different vintages--OpenDocument Format resets the bar for ease of access to the document data. And it is cause for hope among organizations with even modest aspirations for longer document life cycles and smoother business processes.

Sam Hiser is Vice President & Director Business Affairs at the OpenDocument Foundation, Inc. He was advisor to the Commonwealth of Massachusetts Information Technology Division on its pilot of OpenDocument-ready software this year. Hiser also blogs at www.PlexNex.com.

Footnote:
1. Gary Edwards, President of the OpenDocument Foundation, coined this term.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.