Here’s a quick follow-up to earlier post about DocBook elements in the wild. This time I’m focusing on five more recent titles that were all typeset using our DocBook-XSL stylesheet customizations.

[Update: Based on a comment from one of the book’s authors, I should be more explicit about what I mean by “produced”–I’m referring to the period when the book was typeset by O’Reilly’s Production department(s) (and copy-edited, indexed, and reviewed). What was happening during the authoring phase in unspecified. :-)]

In particular, I’m interested in whether our markup for books produced in DocBook differs greatly from books produced in FrameMaker or InDesign and converted into DocBook. Here’s the big graph (with numbers this time):


elements_in_5_books.png

Right off the top, we can see that there aren’t too many changes. Here’s a list of elements in the 49 books that aren’t in the 5:

-attribution
-bibliomisc
-citation
-dedication
-epigraph
-glossary
-glossdef
-glossdiv
-glossentry
-glossterm
-important
-inlinemediaobject
-keysym
-link
-refentry
-refentrytitle
-refmeta
-refname
-refnamediv
-refpurpose
-refsect1
-refsect2
-refsection
-refsynopsisdiv
-sect4
-subtitle
-symbol
-synopsis
-tfoot
-wordasword

None of those strike me as particularly interesting (save perhaps symbol), especially when you consider that we don’t have any reference sections or glossaries in the set of 5.

Here are the elements that were added:

+action
+biblioentry
+bibliography
+classname
+code
+computeroutput
+function
+option
+phrase
+prompt
+pubdate
+quote
+remark
+screen
+section
+simplesect
+uri

This is exactly what I wanted to see by looking specifically at more recent books: finer granularity of markup!

One example: If we can get authors and production editors to distinguish between runnable code (<programlisting>) and the output of a command (<screen> or <computeroutput>), we immediately have the ability to generate more meaningful downloads of example code. The more creative uses for this kind of differentiation are bounded only by our imagination….

Another change did take place in the top of the curve, though I’m not sure how important it is. Here’s the top 5 elements in the 49 book set:

entry	57691
primary	62628
indexterm	63631
literal	97985
para	191901

And our more current five titles:

primary	10510
indexterm	11847
entry	12314
para	21059
literal	21878

Literal moves up in a big way when we look at the more recent titles, but they’re really just a subset of our current publishing (all quite technical), whereas the bigger lists has a more range of titles and audiences. Tables (marked up with <entry>) are important in both, as are indexes.

This set spans 3648 pages in the printed books. These books are significantly bigger than the 49 on average, with more than 700 pages per book, slightly less (two less) elements per page (36.3) and ~26500 elements per book. Here they are the books in question: