I’m teaching a course this week where the attendees requested some basic information on office file formats. People want to know how easy it is to convert from the kind of XML they generate into other purposes. So I loaded a simple HTML file with headings, paragraph, table and list and converted it to various office XML(ish) formats: HTML (through Word 2000), WordML (through Word 2003), pre-standard ODF (SWT, through Open Office 2.0.2), ODF (through Open Office 2.0.2), as XML from the Office 2007 beta, and to XSL-FO using HTML2FO.
The document is a simple one that looks like this (the original has no formatting: the following example may inherit styles not intended):
A heading
A paragraph
A subheading
| A | B |
|---|---|
| 1 | 2 |
- a
- bullet
- list
Some of the file formats save as ZIP, in which case I extracted the content file and left any style files or metadata files (Some of the MS files have embedded metadata). Most of the formats just spew out data onto one line, so I reformatted the XML in Topologi Markup Editor using “Publishing Style” in the Foreman, and XML delimiting.
The WordML file contained a few odd characters like U+FOA7 (Topologi replaces them with a PI in the examples below to mark them out) which is a character in the Private Use Area of Unicode: I’m not completely sure what the purpose of this is, but I suspect they have mapped Wingdings font to the PUA area. I don’t know why they don’t just use the real Unicode characters there; perhaps the same mechanism is used for accessing user-defined fonts (as used in East Asia) with non-standard characters.
Word 2007 gives two options for saving as XML. If you just save it as a word document it is saved as a ZIP file, and the XML contents are in “word/document.xml”. You can also save it direct to XML, which compiles all the parts in the same file: that’s what I used below, with the extra parts removed.
As for sizes, this is a tiny example but it shows
512 eg.html 4.0K eg-word.htm (Word 2000) 4.5K eg-fo.xml (HTML2FO) 4.5K eg-word2007.docx/word/document.xml (Word 2007) 8.0K eg.stw (ZIP file) 8.5K eg.odt (ZIP file) 9.5K content-swt.xml (extracted contents) 10K content-odt.xml (extracted contents) 10K eg.rtf (Word 2000) 11K eg-word2007.xml (Word 2003) 12K eg.word2007.docx(Word 2007) 40K eg-word2007.xml (Word 2007)
Original HTML
<html> <head> <title>An Example</title> </head> <body> <h1>A heading</h1> <p>A paragraph</p> <div> <h2>A subheading</h2> <table> <tr><th>A</td><td>B</td></tr> <tr><td>1</td><td>2</td></tr> </table> <ul> <li>a</li> <li>bullet</li> <li>list</li> </ul> </div> </body> </html>
Re-Exported HTML from Word 2000
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 9">
<meta name=Originator content="Microsoft Word 9">
<link rel=File-List href="./eg-word_files/filelist.xml">
<title>An Example</title>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Author>Jelliffe</o:Author>
<o:Template>Normal</o:Template>
<o:LastAuthor>Jelliffe</o:LastAuthor>
<o:Revision>2</o:Revision>
<o:TotalTime>2</o:TotalTime>
<o:Created>2006-07-15T07:04:00Z</o:Created>
<o:LastSaved>2006-07-15T07:04:00Z</o:LastSaved>
<o:Pages>1</o:Pages>
<o:Words>8</o:Words>
<o:Characters>49</o:Characters>
<o:Company>Allette Systems</o:Company>
<o:Lines>1</o:Lines>
<o:Paragraphs>1</o:Paragraphs>
<o:CharactersWithSpaces>60</o:CharactersWithSpaces>
<o:Version>9.2720</o:Version>
</o:DocumentProperties>
</xml><![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;
mso-font-charset:2;
mso-generic-font-family:auto;
mso-font-pitch:variable;
mso-font-signature:0 268435456 0 0 -2147483648 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0in;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";}
p
{font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.Section1
{page:Section1;}
/* List Definitions */
@list l0
{mso-list-id:2079016596;
mso-list-type:hybrid;
mso-list-template-ids:-784022264 1762421688 139096656 -681416344 1535792872 -1955690730 -515369418 -2012730116 933104990 1380371186;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:F0B7;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US style='tab-interval:.5in'>
<div class=Section1>
<h1>A heading</h1>
<p>A paragraph</p>
<h2>A subheading</h2>
<table border=0 cellpadding=0 style='mso-cellspacing:1.5pt'>
<tr>
<td style='padding:.75pt .75pt .75pt .75pt'>
<p class=MsoNormal align=center style='text-align:center'><b>A<o:p></o:p></b></p>
</td>
</tr>
<tr>
<td style='padding:.75pt .75pt .75pt .75pt'>
<p class=MsoNormal>B</p>
</td>
</tr>
<tr>
<td style='padding:.75pt .75pt .75pt .75pt'>
<p class=MsoNormal>1</p>
</td>
<td style='padding:.75pt .75pt .75pt .75pt'>
<p class=MsoNormal>2</p>
</td>
</tr>
</table>
<ul type=disc>
<li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
mso-list:l0 level1 lfo1;tab-stops:list .5in'>a</li>
<li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
mso-list:l0 level1 lfo1;tab-stops:list .5in'>bullet</li>
<li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
mso-list:l0 level1 lfo1;tab-stops:list .5in'>list</li>
</ul>
</div>
</body>
</html>
WordML from Word 2003
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <?mso-application progid="Word.Document"?> <w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve"> <o:DocumentProperties> <o:Title>An Example</o:Title> <o:Author>Jelliffe</o:Author> <o:LastAuthor>Jelliffe</o:LastAuthor> <o:Revision>2</o:Revision> <o:TotalTime>1</o:TotalTime> <o:Created>2006-07-15T07:14:00Z</o:Created> <o:LastSaved>2006-07-15T07:14:00Z</o:LastSaved> <o:Pages>1</o:Pages> <o:Words>8</o:Words> <o:Characters>51</o:Characters> <o:Company>Allette Systems</o:Company> <o:Lines>1</o:Lines> <o:Paragraphs>1</o:Paragraphs> <o:CharactersWithSpaces>58</o:CharactersWithSpaces> <o:Version>11.6359</o:Version></o:DocumentProperties> <w:fonts> <w:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"/> <w:font w:name="Wingdings"> <w:panose-1 w:val="05000000000000000000"/> <w:charset w:val="02"/> <w:family w:val="Auto"/> <w:pitch w:val="variable"/> <w:sig w:usb-0="00000000" w:usb-1="10000000" w:usb-2="00000000" w:usb-3="00000000" w:csb-0="80000000" w:csb-1="00000000"/></w:font></w:fonts> <w:lists> <w:listDef w:listDefId="0"> <w:lsid w:val="1FD84FAF"/> <w:plt w:val="Multilevel"/> <w:tmpl w:val="633EA6D4"/> <w:lvl w:ilvl="0"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="<?User defined code xF0B7 found?>"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="720"/></w:tabs> <w:ind w:left="720" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Symbol" w:h-ansi="Symbol" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl> <w:lvl w:ilvl="1" w:tentative="on"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="o"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="1440"/></w:tabs> <w:ind w:left="1440" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Courier New" w:h-ansi="Courier New" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl> <w:lvl w:ilvl="2" w:tentative="on"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="<?User defined code xF0A7 found?>"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="2160"/></w:tabs> <w:ind w:left="2160" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Wingdings" w:h-ansi="Wingdings" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl> <w:lvl w:ilvl="3" w:tentative="on"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="<?User defined code xF0A7 found?>"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="2880"/></w:tabs> <w:ind w:left="2880" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Wingdings" w:h-ansi="Wingdings" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl> <w:lvl w:ilvl="4" w:tentative="on"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="<?User defined code xF0A7 found?>"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="3600"/></w:tabs> <w:ind w:left="3600" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Wingdings" w:h-ansi="Wingdings" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl> <w:lvl w:ilvl="5" w:tentative="on"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="<?User defined code xF0A7 found?>"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="4320"/></w:tabs> <w:ind w:left="4320" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Wingdings" w:h-ansi="Wingdings" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl> <w:lvl w:ilvl="6" w:tentative="on"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="<?User defined code xF0A7 found?>"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="5040"/></w:tabs> <w:ind w:left="5040" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Wingdings" w:h-ansi="Wingdings" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl> <w:lvl w:ilvl="7" w:tentative="on"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="<?User defined code xF0A7 found?>"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="5760"/></w:tabs> <w:ind w:left="5760" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Wingdings" w:h-ansi="Wingdings" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl> <w:lvl w:ilvl="8" w:tentative="on"> <w:start w:val="1"/> <w:nfc w:val="23"/> <w:lvlText w:val="<?User defined code xF0A7 found?>"/> <w:lvlJc w:val="left"/> <w:pPr> <w:tabs> <w:tab w:val="list" w:pos="6480"/></w:tabs> <w:ind w:left="6480" w:hanging="360"/></w:pPr> <w:rPr> <w:rFonts w:ascii="Wingdings" w:h-ansi="Wingdings" w:hint="default"/> <w:sz w:val="20"/></w:rPr></w:lvl></w:listDef> <w:list w:ilfo="1"> <w:ilst w:val="0"/></w:list></w:lists> <w:styles> <w:versionOfBuiltInStylenames w:val="4"/> <w:latentStyles w:defLockedState="off" w:latentStyleCount="156"/> <w:style w:type="paragraph" w:default="on" w:styleId="Normal"> <w:name w:val="Normal"/> <w:rPr> <wx:font wx:val="Times New Roman"/> <w:sz w:val="24"/> <w:sz-cs w:val="24"/> <w:lang w:val="EN-US" w:fareast="EN-US" w:bidi="AR-SA"/></w:rPr></w:style> <w:style w:type="paragraph" w:styleId="Heading1"> <w:name w:val="heading 1"/> <wx:uiName wx:val="Heading 1"/> <w:basedOn w:val="Normal"/> <w:pPr> <w:pStyle w:val="Heading1"/> <w:spacing w:before="100" w:before-autospacing="on" w:after="100" w:after-autospacing="on"/> <w:outlineLvl w:val="0"/></w:pPr> <w:rPr> <wx:font wx:val="Times New Roman"/> <w:b/> <w:b-cs/> <w:kern w:val="36"/> <w:sz w:val="48"/> <w:sz-cs w:val="48"/></w:rPr></w:style> <w:style w:type="paragraph" w:styleId="Heading2"> <w:name w:val="heading 2"/> <wx:uiName wx:val="Heading 2"/> <w:basedOn w:val="Normal"/> <w:pPr> <w:pStyle w:val="Heading2"/> <w:spacing w:before="100" w:before-autospacing="on" w:after="100" w:after-autospacing="on"/> <w:outlineLvl w:val="1"/></w:pPr> <w:rPr> <wx:font wx:val="Times New Roman"/> <w:b/> <w:b-cs/> <w:sz w:val="36"/> <w:sz-cs w:val="36"/></w:rPr></w:style> <w:style w:type="character" w:default="on" w:styleId="DefaultParagraphFont"> <w:name w:val="Default Paragraph Font"/> <w:semiHidden/></w:style> <w:style w:type="table" w:default="on" w:styleId="TableNormal"> <w:name w:val="Normal Table"/> <wx:uiName wx:val="Table Normal"/> <w:semiHidden/> <w:rPr> <wx:font wx:val="Times New Roman"/></w:rPr> <w:tblPr> <w:tblInd w:w="0" w:type="dxa"/> <w:tblCellMar> <w:top w:w="0" w:type="dxa"/> <w:left w:w="108" w:type="dxa"/> <w:bottom w:w="0" w:type="dxa"/> <w:right w:w="108" w:type="dxa"/></w:tblCellMar></w:tblPr></w:style> <w:style w:type="list" w:default="on" w:styleId="NoList"> <w:name w:val="No List"/> <w:semiHidden/></w:style> <w:style w:type="paragraph" w:styleId="NormalWeb"> <w:name w:val="Normal (Web)"/> <w:basedOn w:val="Normal"/> <w:pPr> <w:pStyle w:val="NormalWeb"/> <w:spacing w:before="100" w:before-autospacing="on" w:after="100" w:after-autospacing="on"/></w:pPr> <w:rPr> <wx:font wx:val="Times New Roman"/></w:rPr></w:style></w:styles> <w:divs> <w:div w:id="1337466263"> <w:marLeft w:val="0"/> <w:marRight w:val="0"/> <w:marTop w:val="0"/> <w:marBottom w:val="0"/> <w:divBdr> <w:top w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/> <w:left w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/> <w:bottom w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/> <w:right w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/></w:divBdr></w:div></w:divs> <w:shapeDefaults> <o:shapedefaults v:ext="edit" spidmax="2050"/> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1"/></o:shapelayout></w:shapeDefaults> <w:docPr> <w:view w:val="web"/> <w:zoom w:percent="100"/> <w:attachedTemplate w:val=""/> <w:defaultTabStop w:val="720"/> <w:characterSpacingControl w:val="DontCompress"/> <w:webPageEncoding w:val="us-ascii"/> <w:optimizeForBrowser/> <w:validateAgainstSchema/> <w:saveInvalidXML w:val="off"/> <w:ignoreMixedContent w:val="off"/> <w:alwaysShowPlaceholderText w:val="off"/> <w:compat/></w:docPr> <w:body> <wx:sect> <wx:sub-section> <w:p> <w:pPr> <w:pStyle w:val="Heading1"/></w:pPr> <w:r> <w:t>A heading</w:t></w:r></w:p> <w:p> <w:pPr> <w:pStyle w:val="NormalWeb"/></w:pPr> <w:r> <w:t>A paragraph</w:t></w:r></w:p> <wx:sub-section> <w:p> <w:pPr> <w:pStyle w:val="Heading2"/> <w:divId w:val="1337466263"/></w:pPr> <w:r> <w:t>A subheading</w:t></w:r></w:p> <w:tbl> <w:tblPr> <w:tblW w:w="0" w:type="auto"/> <w:tblCellSpacing w:w="15" w:type="dxa"/> <w:tblCellMar> <w:top w:w="15" w:type="dxa"/> <w:left w:w="15" w:type="dxa"/> <w:bottom w:w="15" w:type="dxa"/> <w:right w:w="15" w:type="dxa"/></w:tblCellMar></w:tblPr> <w:tblGrid> <w:gridCol w:w="240"/> <w:gridCol w:w="225"/></w:tblGrid> <w:tr> <w:trPr> <w:divId w:val="1337466263"/> <w:tblCellSpacing w:w="15" w:type="dxa"/></w:trPr> <w:tc> <w:tcPr> <w:tcW w:w="0" w:type="auto"/> <w:vAlign w:val="center"/></w:tcPr> <w:p> <w:pPr> <w:jc w:val="center"/> <w:rPr> <w:b/> <w:b-cs/></w:rPr></w:pPr> <w:r> <w:rPr> <w:b/> <w:b-cs/></w:rPr> <w:t>A</w:t></w:r></w:p></w:tc> <w:tc> <w:tcPr> <w:tcW w:w="0" w:type="auto"/> <w:vAlign w:val="center"/></w:tcPr> <w:p> <w:r> <w:t>B</w:t></w:r></w:p></w:tc></w:tr> <w:tr> <w:trPr> <w:divId w:val="1337466263"/> <w:tblCellSpacing w:w="15" w:type="dxa"/></w:trPr> <w:tc> <w:tcPr> <w:tcW w:w="0" w:type="auto"/> <w:vAlign w:val="center"/></w:tcPr> <w:p> <w:r> <w:t>1</w:t></w:r></w:p></w:tc> <w:tc> <w:tcPr> <w:tcW w:w="0" w:type="auto"/> <w:vAlign w:val="center"/></w:tcPr> <w:p> <w:r> <w:t>2</w:t></w:r></w:p></w:tc></w:tr></w:tbl> <w:p> <w:pPr> <w:listPr> <w:ilvl w:val="0"/> <w:ilfo w:val="1"/> <wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="270"/> <wx:font wx:val="Symbol"/></w:listPr> <w:spacing w:before="100" w:before-autospacing="on" w:after="100" w:after-autospacing="on"/> <w:divId w:val="1337466263"/></w:pPr> <w:r> <w:t>a</w:t></w:r></w:p> <w:p> <w:pPr> <w:listPr> <w:ilvl w:val="0"/> <w:ilfo w:val="1"/> <wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="270"/> <wx:font wx:val="Symbol"/></w:listPr> <w:spacing w:before="100" w:before-autospacing="on" w:after="100" w:after-autospacing="on"/> <w:divId w:val="1337466263"/></w:pPr> <w:r> <w:t>bullet</w:t></w:r></w:p> <w:p> <w:pPr> <w:listPr> <w:ilvl w:val="0"/> <w:ilfo w:val="1"/> <wx:t wx:val="·" wx:wTabBefore="360" wx:wTabAfter="270"/> <wx:font wx:val="Symbol"/></w:listPr> <w:spacing w:before="100" w:before-autospacing="on" w:after="100" w:after-autospacing="on"/> <w:divId w:val="1337466263"/></w:pPr> <w:r> <w:t>list</w:t></w:r></w:p> <w:sectPr> <w:pgSz w:w="12240" w:h="15840"/> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/> <w:cols w:space="720"/> <w:docGrid w:line-pitch="360"/></w:sectPr></wx:sub-section></wx:sub-section></wx:sect></w:body></w:wordDocument>
Old ODF from Open Office 2.0.2
<?xml version="1.0" encoding="UTF-8"?> <office:document-content xmlns:office="http://openoffice.org/2000/office" xmlns:style="http://openoffice.org/2000/style" xmlns:text="http://openoffice.org/2000/text" xmlns:table="http://openoffice.org/2000/table" xmlns:draw="http://openoffice.org/2000/drawing" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="http://openoffice.org/2000/meta" xmlns:number="http://openoffice.org/2000/datastyle" xmlns:svg="http://www.w3.org/2000/svg" xmlns:chart="http://openoffice.org/2000/chart" xmlns:dr3d="http://openoffice.org/2000/dr3d" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="http://openoffice.org/2000/form" xmlns:script="http://openoffice.org/2000/script" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" office:version="1.0" office:class="text"> <office:script/> <office:font-decls> <style:font-decl style:name="StarSymbol" fo:font-family="StarSymbol" style:font-charset="x-symbol"/> <style:font-decl style:name="Lucidasans1" fo:font-family="Lucidasans"/> <style:font-decl style:name="Arial Unicode MS" fo:font-family="'Arial Unicode MS'" style:font-pitch="variable"/> <style:font-decl style:name="Bitstream Vera Sans" fo:font-family="'Bitstream Vera Sans'" style:font-pitch="variable"/> <style:font-decl style:name="HG Mincho Light J" fo:font-family="'HG Mincho Light J'" style:font-pitch="variable"/> <style:font-decl style:name="Lucidasans" fo:font-family="Lucidasans" style:font-pitch="variable"/> <style:font-decl style:name="Bitstream Vera Serif" fo:font-family="'Bitstream Vera Serif'" style:font-family-generic="roman" style:font-pitch="variable"/> <style:font-decl style:name="Thorndale" fo:font-family="Thorndale" style:font-family-generic="roman" style:font-pitch="variable"/> <style:font-decl style:name="Albany" fo:font-family="Albany" style:font-family-generic="swiss" style:font-pitch="variable"/></office:font-decls> <office:automatic-styles> <style:style style:name="Table1" style:family="table"> <style:properties style:width="0.4389inch" table:align="left"/></style:style> <style:style style:name="Table1.A" style:family="table-column"> <style:properties style:column-width="0.209inch"/></style:style> <style:style style:name="Table1.B" style:family="table-column"> <style:properties style:column-width="0.2299inch"/></style:style> <style:style style:name="Table1.A1" style:family="table-cell"> <style:properties fo:vertical-align="middle" fo:padding="0.0194inch" fo:border="none"/></style:style> <style:style style:name="P1" style:family="paragraph" style:parent-style-name="Heading 1" style:master-page-name="HTML"/> <style:style style:name="P2" style:family="paragraph" style:parent-style-name="Text body" style:list-style-name="L2"> <style:properties fo:margin-top="0inch" fo:margin-bottom="0inch"/></style:style> <style:style style:name="P3" style:family="paragraph" style:parent-style-name="Text body" style:list-style-name="L2"/> <text:list-style style:name="L1"> <text:list-level-style-number text:level="1" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="2" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="3" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="4" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="5" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="6" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="7" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="8" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="9" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number> <text:list-level-style-number text:level="10" style:num-format=""> <style:properties text:min-label-distance="0.15inch"/></text:list-level-style-number></text:list-style> <text:list-style style:name="L2"> <text:list-level-style-bullet text:level="1" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="0.2945inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="2" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="0.7854inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="3" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="1.2764inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="4" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="1.7673inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="5" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="2.2583inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="6" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="2.7492inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="7" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="3.2402inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="8" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="3.7315inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="9" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="4.2224inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="10" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="•"> <style:properties text:space-before="4.7134inch" text:min-label-width="0.1965inch" style:font-name="StarSymbol"/></text:list-level-style-bullet></text:list-style></office:automatic-styles> <office:body> <office:forms form:automatic-focus="false" form:apply-design-mode="false"/> <text:sequence-decls> <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/> <text:sequence-decl text:display-outline-level="0" text:name="Table"/> <text:sequence-decl text:display-outline-level="0" text:name="Text"/> <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/></text:sequence-decls> <text:h text:style-name="P1" text:level="1">A heading</text:h> <text:p text:style-name="Text body">A paragraph</text:p> <text:h text:style-name="Heading 2" text:level="2" text:is-list-header="true">A subheading</text:h> <table:table table:name="Table1" table:style-name="Table1"> <table:table-column table:style-name="Table1.A"/> <table:table-column table:style-name="Table1.B"/> <table:table-row> <table:table-cell table:style-name="Table1.A1" table:value-type="string"> <text:p text:style-name="Table Heading">A</text:p></table:table-cell> <table:table-cell table:style-name="Table1.A1" table:value-type="string"> <text:p text:style-name="Table Contents">B</text:p></table:table-cell></table:table-row> <table:table-row> <table:table-cell table:style-name="Table1.A1" table:value-type="string"> <text:p text:style-name="Table Contents">1</text:p></table:table-cell> <table:table-cell table:style-name="Table1.A1" table:value-type="string"> <text:p text:style-name="Table Contents">2</text:p></table:table-cell></table:table-row></table:table> <text:ordered-list text:style-name="L2"> <text:list-item> <text:p text:style-name="P2">a </text:p></text:list-item> <text:list-item> <text:p text:style-name="P2">bullet </text:p></text:list-item> <text:list-item> <text:p text:style-name="P3">list </text:p></text:list-item></text:ordered-list></office:body></office:document-content>
New ODF from Open Office 2.0.2
Very similar to the old version, except the namespace and the use of SVG conventions rather than XSL-FO conventions.
<?xml version="1.0" encoding="UTF-8"?> <office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" office:version="1.0"> <office:scripts/> <office:font-face-decls> <style:font-face style:name="StarSymbol" svg:font-family="StarSymbol" style:font-charset="x-symbol"/> <style:font-face style:name="Lucidasans1" svg:font-family="Lucidasans"/> <style:font-face style:name="Arial Unicode MS" svg:font-family="'Arial Unicode MS'" style:font-pitch="variable"/> <style:font-face style:name="Bitstream Vera Sans" svg:font-family="'Bitstream Vera Sans'" style:font-pitch="variable"/> <style:font-face style:name="HG Mincho Light J" svg:font-family="'HG Mincho Light J'" style:font-pitch="variable"/> <style:font-face style:name="Lucidasans" svg:font-family="Lucidasans" style:font-pitch="variable"/> <style:font-face style:name="Bitstream Vera Serif" svg:font-family="'Bitstream Vera Serif'" style:font-family-generic="roman" style:font-pitch="variable"/> <style:font-face style:name="Thorndale" svg:font-family="Thorndale" style:font-family-generic="roman" style:font-pitch="variable"/> <style:font-face style:name="Albany" svg:font-family="Albany" style:font-family-generic="swiss" style:font-pitch="variable"/></office:font-face-decls> <office:automatic-styles> <style:style style:name="Table1" style:family="table"> <style:table-properties style:width="0.4389in" table:align="left"/></style:style> <style:style style:name="Table1.A" style:family="table-column"> <style:table-column-properties style:column-width="0.209in"/></style:style> <style:style style:name="Table1.B" style:family="table-column"> <style:table-column-properties style:column-width="0.2299in"/></style:style> <style:style style:name="Table1.A1" style:family="table-cell"> <style:table-cell-properties style:vertical-align="middle" fo:padding="0.0194in" fo:border="none"/></style:style> <style:style style:name="P1" style:family="paragraph" style:parent-style-name="Heading_20_1" style:master-page-name="HTML"/> <style:style style:name="P2" style:family="paragraph" style:parent-style-name="Text_20_body" style:list-style-name="L2"> <style:paragraph-properties fo:margin-top="0in" fo:margin-bottom="0in"/></style:style> <style:style style:name="P3" style:family="paragraph" style:parent-style-name="Text_20_body" style:list-style-name="L2"/> <text:list-style style:name="L1"> <text:list-level-style-number text:level="1" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="2" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="3" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="4" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="5" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="6" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="7" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="8" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="9" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number> <text:list-level-style-number text:level="10" style:num-format=""> <style:list-level-properties text:min-label-distance="0.15in"/></text:list-level-style-number></text:list-style> <text:list-style style:name="L2"> <text:list-level-style-bullet text:level="1" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="0.2945in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="2" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="0.7854in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="3" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="1.2764in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="4" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="1.7673in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="5" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="2.2583in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="6" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="2.7492in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="7" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="3.2402in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="8" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="3.7315in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="9" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="4.2224in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet> <text:list-level-style-bullet text:level="10" text:style-name="Bullet_20_Symbols" style:num-suffix="." text:bullet-char="•"> <style:list-level-properties text:space-before="4.7134in" text:min-label-width="0.1965in"/> <style:text-properties style:font-name="StarSymbol"/></text:list-level-style-bullet></text:list-style></office:automatic-styles> <office:body> <office:text> <office:forms form:automatic-focus="false" form:apply-design-mode="false"/> <text:sequence-decls> <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/> <text:sequence-decl text:display-outline-level="0" text:name="Table"/> <text:sequence-decl text:display-outline-level="0" text:name="Text"/> <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/></text:sequence-decls> <text:h text:style-name="P1" text:outline-level="1">A heading</text:h> <text:p text:style-name="Text_20_body">A paragraph</text:p> <text:h text:style-name="Heading_20_2" text:outline-level="2" text:is-list-header="true">A subheading</text:h> <table:table table:name="Table1" table:style-name="Table1"> <table:table-column table:style-name="Table1.A"/> <table:table-column table:style-name="Table1.B"/> <table:table-row> <table:table-cell table:style-name="Table1.A1" office:value-type="string"> <text:p text:style-name="Table_20_Heading">A</text:p></table:table-cell> <table:table-cell table:style-name="Table1.A1" office:value-type="string"> <text:p text:style-name="Table_20_Contents">B</text:p></table:table-cell></table:table-row> <table:table-row> <table:table-cell table:style-name="Table1.A1" office:value-type="string"> <text:p text:style-name="Table_20_Contents">1</text:p></table:table-cell> <table:table-cell table:style-name="Table1.A1" office:value-type="string"> <text:p text:style-name="Table_20_Contents">2</text:p></table:table-cell></table:table-row></table:table> <text:list text:style-name="L2"> <text:list-item> <text:p text:style-name="P2">a </text:p></text:list-item> <text:list-item> <text:p text:style-name="P2">bullet </text:p></text:list-item> <text:list-item> <text:p text:style-name="P3">list </text:p></text:list-item></text:list></office:text></office:body></office:document-content>
Microsoft Office Open XML from Office 2007 beta
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">
<pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:padding="512">
<pkg:xmlData>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml"/>
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml"/>
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/>
</Relationships></pkg:xmlData></pkg:part>
<pkg:part pkg:name="/word/_rels/document.xml.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:padding="256">
<pkg:xmlData>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/>
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering" Target="numbering.xml"/>
<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable" Target="fontTable.xml"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings" Target="webSettings.xml"/>
</Relationships></pkg:xmlData></pkg:part>
<pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml">
<pkg:xmlData>
<w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:o12="http://schemas.microsoft.com/office/2004/7/core"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:m="http://schemas.microsoft.com/office/omml/2004/12/core"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/3/wordprocessingDrawing"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/3/main">
<w:body>
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1"/>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr>
<w:t>A heading</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:pStyle w:val="NormalWeb"/></w:pPr>
<w:r w:rsidR="0019729A">
<w:t>A paragraph</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:pStyle w:val="Heading2"/>
<w:divId w:val="80100962"/>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr>
<w:t>A subheading</w:t></w:r></w:p>
<w:tbl>
<w:tblPr>
<w:tblW w:w="0" w:type="auto"/>
<w:tblCellSpacing w:w="15" w:type="dxa"/>
<w:tblCellMar>
<w:top w:w="15" w:type="dxa"/>
<w:left w:w="15" w:type="dxa"/>
<w:bottom w:w="15" w:type="dxa"/>
<w:right w:w="15" w:type="dxa"/></w:tblCellMar></w:tblPr>
<w:tblGrid>
<w:gridCol w:w="249"/>
<w:gridCol w:w="236"/>
</w:tblGrid>
<w:tr w:rsidR="00000000">
<w:trPr>
<w:divId w:val="80100962"/>
<w:tblCellSpacing w:w="15" w:type="dxa"/></w:trPr>
<w:tc>
<w:tcPr>
<w:tcW w:w="0" w:type="auto"/>
<w:vAlign w:val="center"/></w:tcPr>
<w:p>
<w:pPr>
<w:jc w:val="center"/>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/>
<w:b/>
<w:bCs/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/>
<w:b/>
<w:bCs/></w:rPr>
<w:t>A</w:t></w:r></w:p></w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="0" w:type="auto"/>
<w:vAlign w:val="center"/></w:tcPr>
<w:p>
<w:pPr>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr>
<w:t>B</w:t></w:r></w:p></w:tc></w:tr>
<w:tr w:rsidR="00000000">
<w:trPr>
<w:divId w:val="80100962"/>
<w:tblCellSpacing w:w="15" w:type="dxa"/></w:trPr>
<w:tc>
<w:tcPr>
<w:tcW w:w="0" w:type="auto"/>
<w:vAlign w:val="center"/></w:tcPr>
<w:p>
<w:pPr>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr>
<w:t>1</w:t></w:r></w:p></w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="0" w:type="auto"/>
<w:vAlign w:val="center"/></w:tcPr>
<w:p>
<w:pPr>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr>
<w:t>2</w:t></w:r></w:p></w:tc></w:tr></w:tbl>
<w:p>
<w:pPr>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="1"/></w:numPr>
<w:spacing w:before="100" w:beforeAutospacing="1" w:after="100"
w:afterAutospacing="1"/>
<w:divId w:val="80100962"/>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr>
<w:t>a</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="1"/></w:numPr>
<w:spacing w:before="100" w:beforeAutospacing="1" w:after="100"
w:afterAutospacing="1"/>
<w:divId w:val="80100962"/>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr>
<w:t>bullet</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="1"/></w:numPr>
<w:spacing w:before="100" w:beforeAutospacing="1" w:after="100"
w:afterAutospacing="1"/>
<w:divId w:val="80100962"/>
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr></w:pPr>
<w:r w:rsidR="0019729A">
<w:rPr>
<w:rFonts w:eastAsia="Times New Roman"/></w:rPr>
<w:t>list</w:t></w:r></w:p>
<w:sectPr w:rsidR="0019729A">
<w:pgSz w:w="11906" w:h="16838"/>
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"
w:header="708" w:footer="708" w:gutter="0"/>
<w:cols w:space="708"/>
<w:docGrid w:linePitch="360"/></w:sectPr></w:body></w:document></pkg:xmlData>
</pkg:part>
<pkg:part pkg:name="/word/theme/theme1.xml"
pkg:contentType="application/vnd.openxmlformats-officedocument.theme+xml">
<pkg:xmlData pkg:originalXmlStandalone="no">
Hundreds of Lines Removed Here
</pkg:part>
</pkg:package>
XSL-FO
<?xml version="1.0" encoding="ISO-8859-1" ?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:fox="http://xml.apache.org/fop/extensions"> <!-- Creator="html2fo" Version="0.4.2" --> <fo:layout-master-set> <fo:simple-page-master margin-right="2.0cm" margin-left="2.0cm" margin-bottom="1.0cm" margin-top="1.0cm" page-width="21cm" page-height="29.7cm" master-name="first"> <fo:region-before extent="1.5cm"/> <fo:region-body margin-bottom="1.5cm" margin-top="1.5cm"/> <fo:region-after extent="1.0cm"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="first" language="en" hyphenate="true"> <fo:static-content flow-name="xsl-region-before"> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">An Example</fo:block></fo:static-content> <fo:static-content flow-name="xsl-region-after"> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always"> ... the footer should be inserted here ... </fo:block></fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always" font-weight="bold" line-height="32pt" font-size="16pt">A heading</fo:block> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">A paragraph</fo:block> <fo:block> <fo:block space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always" line-height="32pt" font-size="16pt">A subheading</fo:block> <fo:table text-align="left" table-layout="fixed"> <fo:table-column column-width="8.15cm"/> <fo:table-column column-width="8.15cm"/> <fo:table-body> <fo:table-row> <fo:table-cell> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">A</fo:block></fo:table-cell> <fo:table-cell> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">B</fo:block></fo:table-cell></fo:table-row> <fo:table-row> <fo:table-cell> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">1</fo:block></fo:table-cell> <fo:table-cell> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">2</fo:block></fo:table-cell></fo:table-row> </fo:table-body></fo:table> <fo:list-block provisional-label-separation="3pt" provisional-distance-between-starts="14pt"> <fo:list-item> <fo:list-item-label> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">-</fo:block></fo:list-item-label> <fo:list-item-body start-indent="body-start()"> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">a</fo:block></fo:list-item-body></fo:list-item> <fo:list-item> <fo:list-item-label> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">-</fo:block></fo:list-item-label> <fo:list-item-body start-indent="body-start()"> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">bullet</fo:block></fo:list-item-body></fo:list-item> <fo:list-item> <fo:list-item-label> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">-</fo:block></fo:list-item-label> <fo:list-item-body start-indent="body-start()"> <fo:block line-height="12pt" font-size="10pt" space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always">list</fo:block></fo:list-item-body></fo:list-item> </fo:list-block> </fo:block> <fo:block space-before.optimum="1.5pt" space-after.optimum="1.5pt" keep-together="always" id="LastPage" line-height="1pt" font-size="1pt"></fo:block></fo:flow> </fo:page-sequence></fo:root>


You might have started with an "Original HTML" with code that validates. E.g., where is your DOCTYPE, charset declaration, etc?
There is no need for a charset declaration because I only used ASCII. Consequently every possible default charset potentially involved (text/plain = ASCII, text/html= ISO 8859-1, xml=UTF-8) will be correct.
As for DOCTYPE, surely you are trolling? There are lots of variant possibilities; more interesting ones would be what happens when the XHTML namespace is used and what happens when class attributes are used.
Isn't it amazing how much effort goes into "formatting" and how little care about semantic compatibility, in all of document formats. One would expect those demanding compatibility would be interested in what is IN the documents, rather than only how do they look.
All complex proprietary tags are just obscuring the content more and more. This is also a bad design, since content and formatting are mixed. HTML may still be the best solution, because it can separate most of formatting into CSS easily.
Not to mention absolute lack of "linking" capability that made modern (Google and similar) web search so useful. Is there an equivalent of HTML "a" tag in other formats that can reference outside of the document? Or something like: arizona_trip_2006.doc#day5
The simplicity of the HTML says it all.
I've had to translate a fair amount of RTF into HTML and then use it in report generators. The simplicity of raw HTML beats the heck out of any XML format with built-in compatibilities.
What do you get for giving up future and past proofing? Conservation of effort.
Hmm, comparing HTML w/o CSS with formats including style definitions, isn't it like comparing oranges with apples?
And isn't the example too simple, isn't it? I don't mean that layout languages should be complex but the last 10 years of HTML showed that HTML has its limitations, probably because it's used nowadays for things it never has been intended for.
xix [nine-teen]
Of course, even with HTML + CSS - which DOES get considerably farther down the stylistic pike, you're still talking about a surprisingly minimal set of information required compared to any of the "formal" page layout specifications. However, I think a more honest test may be to put together a formal page layout that needs to be replicated in any of these languages, including HTML+CSS. I suspect that at least some of the disparity in sizes might disappear.
What I wanted to see with these files was just how text, lists and tables were handled in the different formats. xix is right that size is not everything (it is not nothing, on the other hand.) But Dragan is right about semantics too. And, I should mention that the Open Office XML format is not fully baked yet.
Interestingly, there is an error in the HTML: the first TH has a TD end tag. The HTML generated by Word 2000 strips out the TD and replaces it with bold and generates an extra row. Yuck. The XSL-FO strips out info about whether h1 or p is used: perhaps this is not intrinsic to FO and there is a way to keep this info, I don't know. But we couldn't recover the original HTML from the FO.
ODF and the Word XML outputs do preserve enough information in the input to regenerate the original HTML. Except for the element.
oops...I mean the <div> element
I'd like to see a format like DocBook in this comparision, too.
Size is proportional the handling semantics being exposed. To handle multiple subjective views for any given element, the underlying object is complexified and this feeds back into the external representation. Think of the markup as the shape of the manifold to which the semantics are mapped. The HTML application objects are doing just a few things ok. The other application objects are doing many things well. Flexibility comes at the cost of a very lumpy manifold. Any mapping to that shares that cost expressed in the markup.
Very interesting. The relative sizes of these formats is unbelievable, and the actual markup looks hard to read.
So all that is due to the formats trying to be more flexible? Which capabilities included in these formats have caused such size and complexity?
I have another blog entry on how the issue of size and complexity, with some XLST code.
WordProcessingML, OpenDocument and XSL FO are designed to solve different problems than HTML. They focus on layout, formatting and printing. Wordprocessing applications are used (or should be used) to create printed documents, using them in a purely digital way/world is quite pointless.
Furthermore you're not comparing the formats but tools for creating those formats against your HTML skills. I'm not aware of any WYSIWYG HTML editor that will create such compact HTML code.
I've done some project recently involving creating WordProcessingML documents and I can tell you that you can strip nearly half of the markup from your file without loosing anything.
I agree with what has been said by len and others before: HTML has a completely different goal then the other formats.
Since there is no layouting information present in HTML, you rely on the browsers interpretation of your "styling" information. To generate a truly 100% un-misinterpretable layout, the document would be _very much_ longer. Thus, for fairness, you should include the business rules for interpreting the HTML of all the major browsers - let's see which one is the longest then! :)
On the other hand, it is true that MS puts loads and loads of trash in their documents. If you look at the XML structure, there is so much redundant and even unneccessary information that makes the documents so incredibly long.
To sum it up: if you want a HTML document that defines it layout properly without relying on ANY interpretation of the browser for where to position an element etc. you will end up with a longer document than a well formed XML document with no redundancy in it. You are just not taking advantage of all the programmatical assumtions and layouting definitions in the browser.
Regards,
Che
Btw: I very much agree with the comment of one person: to be fair you should compare a dreamweaver (or any other WYSIWYG editor) HTML document to the rest...
And Rick, Jim M. has a point: your document is NOT valid. You are REQUIRED to specify a doctype for an HTML document to be valid. Just because all browsers nowadays handle HTML even without doctype does not mean it is optional!
Just ranting because it is horrible to see what people do to the HTML lately... :)
LeChe: Thanks for the comment!
To be "fair", I think I would have to try every possible application saving multiple files of different sizes and usages of styles/hardcoding to every possible format each supports. And then import them to every other application and re-save them in every possible format. This should only require a few tens of thousands of documents. :-)
The blog is entirely factual: I (tried to) make sure there was no evaluation or interpretations of the facts offered. Facts are not fair or unfair; they are pre-fair. I suppose selective presentation of facts is unfair, and incomplete facts can be misleading, but I didn't vet the results or make comments that could be misleading.
But there is very little material on the WWW that actually directly compares the different formats. Part of the reason is that people are too lazy to do it; they would prefer to sit in the armchairs coddling their prejudices.
As far as HTML validation requiring an DOCTYPE declaration, validating the example will not change the structures or information set in any significant way. Nor would it, I'd expect, alter any of the outputs in any significant ways.