[Update: Based on a comment from one of the book’s authors, I should be more explicit about what I mean by “produced”–I’m referring to the period when the book was typeset by O’Reilly’s Production department(s) (and copy-edited, indexed, and reviewed). What was happening during the authoring phase in unspecified. :-)]
In particular, I’m interested in whether our markup for books produced in DocBook differs greatly from books produced in FrameMaker or InDesign and converted into DocBook. Here’s the big graph (with numbers this time):
Right off the top, we can see that there aren’t too many changes. Here’s a list of elements in the 49 books that aren’t in the 5:
-attribution -bibliomisc -citation -dedication -epigraph -glossary -glossdef -glossdiv -glossentry -glossterm -important -inlinemediaobject -keysym -link -refentry -refentrytitle -refmeta -refname -refnamediv -refpurpose -refsect1 -refsect2 -refsection -refsynopsisdiv -sect4 -subtitle -symbol -synopsis -tfoot -wordasword
None of those strike me as particularly interesting (save perhaps
symbol), especially when you consider that we don’t have any reference sections or glossaries in the set of 5.
Here are the elements that were added:
+action +biblioentry +bibliography +classname +code +computeroutput +function +option +phrase +prompt +pubdate +quote +remark +screen +section +simplesect +uri
This is exactly what I wanted to see by looking specifically at more recent books: finer granularity of markup!
One example: If we can get authors and production editors to distinguish between runnable code (
<programlisting>) and the output of a command (
<computeroutput>), we immediately have the ability to generate more meaningful downloads of example code. The more creative uses for this kind of differentiation are bounded only by our imagination….
Another change did take place in the top of the curve, though I’m not sure how important it is. Here’s the top 5 elements in the 49 book set:
entry 57691 primary 62628 indexterm 63631 literal 97985 para 191901
And our more current five titles:
primary 10510 indexterm 11847 entry 12314 para 21059 literal 21878
Literal moves up in a big way when we look at the more recent titles, but they’re really just a subset of our current publishing (all quite technical), whereas the bigger lists has a more range of titles and audiences. Tables (marked up with
<entry>) are important in both, as are indexes.
This set spans 3648 pages in the printed books. These books are significantly bigger than the 49 on average, with more than 700 pages per book, slightly less (two less) elements per page (36.3) and ~26500 elements per book. Here they are the books in question: