Splitting Books Open: Trends in Traditional and Online Technical Documentationby Andy Oram
[ Author's Note: The following is based on a talk I first gave at the O'Reilly Open Source Convention, July 30, 2004. Had I turned the presentation into the kind of conventional essay that high-school English teachers used to make me read and write, the result would have been so long and tedious that no one would get through it. I have therefore left the presentation as a set of expanded bullet points, reflecting the talk the way I delivered it. In fact, this choice of an informal, oral style illustrates one of the points made in the talk. ]
While technical publishers strive to adapt to new online media and formats, online efforts at self-education by computer users are becoming a form of true grassroots documentation. This talk discusses the strengths and weaknesses of each side--traditional books and user self-education--and suggests how they may converge. It offers suggestions for improving the educational effects of mailing lists, computing project web sites, and other community documentation.
The best traditional books possess many virtues: appropriate pacing, a knowledge of the audience, meaningful technical background, and good structure. However, these traditional books take too long to write and make too many compromises in their attempt to reach a large audience and boost sales.
In turn, community education efforts (the rich environment of mailing lists, newsgroups, chat rooms, and project web sites) offer immediate answers to questions from knowledgeable peers. However, they suffer from time wasted in searching for information, results that are unreliable, and difficulties in knowing where to start. Some other current limitations of the community environment for learning are the domination of English, a difficulty in respecting cultural differences and different learning styles, and gaps in documentation.
User education can be improved by promoting active community participants to become formal contributors, incorporating professionals into community documentation, nurturing new users, pointing people to documents, and enhancing rating systems. The Safari Bookshelf is an example of professional online documentation that can enhance user efforts.
The major stepping stones within this talk are:
- Traditional Documentation: Where Is It Going?
A brief look at traditional publishers such as O'Reilly Media and where we fall short.
- Community Documentation
A review of spontaneous, community documentation--users educating themselves. The types of documentation I consider here go far beyond the works that mimic published documentation, such as the Linux Documentation Project.
- Benefits of Traditional Documentation
We return to published books to consider where they add special value to the users' education.
- The Beginnings of a Merger
Each side--traditional publishers and project leaders in the community--recognizes that the other side has something to offer and something to learn from. The two sides are therefore converging.
- Improving Community Documentation
How do we make the most of efforts at community education? This is the largest section of the talk, with specific and (I hope) useful suggestions for people leading or championing software projects.
- A Hortatory Goal
Finally, I wish to recruit readers to help me create a movement to make new efforts at documentation successful.
I hope this talk will help you:
- Track important trends in documentation.
- See how users can improve their education so they can make better use of technology.
- Eliminate much wasted time and duplicated effort.
- My experience with books
I have spent ten years writing manuals for computer companies (not something to be particularly proud of) and then over eleven years editing O'Reilly books, particularly on free and open source software.
My work as an editor has been fulfilling but increasingly frustrating. It's a long slog until I finish a book; then I wait three months for its release. (And that's a short time, so far as traditional publishers go.) Our goals are often muddied by time pressures and conflicting requirements.
What Went Wrong with Books?
- The Text defined
To understand why current published documentation is not meeting users' needs, we must go beyond elementary observations such as timeliness. We have to look at the tradition I call The Text with a capital T. (Actually, two capital Ts.) The Text is a timeless, unalterable artifact to be approached with reverence.
- Historical precedents
The original Text consisted of the five books of Moses. We have added many other works to this revered status: the writings of Homer, Shakespeare, and so forth. Students often analyze Shakespeare's sonnets comma by comma. I have seen a literary critic discuss the impact of how Shakespeare spelled various words in the sonnets (because spelling was much less standardized in Shakespeare's time than our own).
- The Text is often corrupt
What's odd about the concept of The Text is that many of these texts come down to us corrupted, or even in multiple versions. For instance, Shakespeare's plays exist as Folios and as Quartos. Yet they are still considered unalterable and every detail matters.
- When there is a Text in technical documentation
Only a few achievements in technical documentation have reached the status of The Text: Donald Knuth's The Art of Computer Programming, Kernighan and Ritchie's book on C, and perhaps a few others such as Charles Petzold's book on Windows programming. Books with words such as "Bible" in their titles come nowhere near canonical status.
Requirements for Writing The Text
- The work takes a long time
You want it to be perfect, so you have to take time to make it so.
- The work must remain relevant for a long time
Because you took so long to write it (and will take so long to write a revision), the work must be of interest to readers for a long time. Many books never get started because everybody involved in the project knows the contents will be obsolete before any money can be recouped.
- The work must sell a minimum number of copies
The costs of editing, producing, and marketing the work lead to steep requirements for sales. Often at O'Reilly, we like a project and wish to promote it, but we can't justify doing the book on the basis of expected sales. We like to tell the author, "We can't afford to do your book--yet," which pleasantly leaves open the possibility for future collaboration if the project takes off. But this practice delays our entry into the market if/when we decide the book could prove worthwhile.
- The work must provide a lot of background to accommodate multiple audiences
This is where the problems of The Text get subtle and interesting. Because the publisher is required to sell a lot of copies, the publisher and author are tempted to pad the book with extra material that may appeal to one segment of an audience or another. This resembles politicians making speeches, where you can identify the sentence aimed at the Latino audience, the sentence aimed at soccer moms, and so forth.
Many people purchase a computer book that looks interesting based on title and back cover, only to find that there are just a couple chapters of interest to them. If you feel that a lot of the book was written for somebody else, you are probably right.
The Text and, therefore, the publishing model we've had for centuries don't match well to fast-moving technical fields. But the tragedy of published books is compounded by some more mundane factors.
Most technical books are quite poor. And unfortunately, people in technical fields have come to assume that this is documentation's natural state. From their first encounters in elementary school, users expect unreadable texts. They get their first chemistry or math textbook and find basic writing failures such as:
- Long explanations of arcane topics with no justification.
- Fussy, nit-picking distinctions that interrupt the flow of ideas.
- Terms used before they are defined.
I'm convinced that the frustration of trying to make sense out of these poor textbooks is the reason many talented students drop out of math and science--a real tragedy. Those who carry on learn to tolerate bad texts through high school, college, graduate school, and right up to when they get a job and crack open a computer book.
There are many reasons for documentation's move online and to the user community. One can easily cite costs, speed of updates, and so forth. But Aristotle, who defined four causes for things to happen, defined the final cause as the force that really drives forward change--what makes something have to happen. The final cause for the move to online community documentation is this: it is more successful than The Text at meeting user needs.
- Have you ever answered a query on an online forum, such as a mailing list, newsgroup, or chat? And the accompanying question: Did you think you were creating documentation?
Some people said they did. At the very least, they were leaving a written record of new information.
- Have you had your answer on the online forum archived?
If so, the archive makes a strong case for calling the answer documentation.
- Have you searched an online forum for an answer?
This clinches the question. Someone has created useful information that you have used to solve your problem, without you knowing the person or perhaps having any relationship with the group in which he or she created the information. In my book, that's documentation. And in fact, many system administrators tell me that when they encounter a need for information, their first recourse is neither a printed book nor an official online page, but their favorite search engine.
- Have you asked a question on an online forum?
And did you realize you too were contributing to documentation? I think you were. You identified missing information and mobilized a set of people to fill the need.
Conclusion: documentation is in your hands.
User Education Is Community Education
- Community is built with one IRC chat, one newsgroup, one mailing list at a time
When you join a group and answer someone's question, or pose questions that other people answer, the people in the group are taking care of each other. That's what makes it a community.
- User groups
Technical support groups need not be just virtual ones with an online existence. For instance, near the beginning of Linux's spread, the hardest thing about Linux was simply getting it installed. (On some hardware, it's still hard.) So people in major cities around the world would come together in "installfests" to help each other get Linux up and running. Of course, these groups have online components too.
In short, a community is people taking care of each other, which is what many mailing lists, newsgroups, and chat rooms do.
Characteristics of Future Documentation
Community efforts such as mailing lists show us where technical documentation as a whole is heading.
This allows documentation to be published and updated instantaneously.
- Freely distributable
The documentation will be available to everybody in the world with Internet access. You can safely refer people to it without worrying whether they can obtain it in their part of the world, or whether they can afford it.
- Localized and topical
By these terms, I mean that documentation can be written for a specific audience with an immediate need. For instance, instead of a single huge book about MySQL, someone may write a tutorial on MySQL for Oracle programmers interested in migrating, someone else may write a tutorial on MySQL for web administrators interested in adding dynamic content to their sites, and so forth.
Advantages of Future Documentation
- It can be done in one's spare time by someone close to the topic and the audience
When I plan a book with an author, it's daunting. I lay out a schedule for a year in advance and the author often has to take time off from some other activity, such as teaching. In contrast, a short web page can be written by somebody with a passing interest over a weekend.
- It can be as small as a question and answer
We saw this while discussing the use of mailing lists and newsgroups.
- Distribution costs are minimal
No more paper, warehouses, and trucks. Just a web server and some bandwidth. And if the software is popular, a set of mirror sites to spread the bandwidth demands around.
- No more enormous tomes--give people as much as they'll read in a sitting
Why write a thousand-page book? Nobody reads a thousand pages in one sitting. You might as well give them a chunk of text they'll read all at once.
- No need to appeal to broad audiences
There's no thought of padding an online document. Each person can write for the people he or she is interested in helping.
So What's Not to Like?
- Lots of wasted time spent searching
While search engines and archives offer impressive results, everyone can remember a time when they had to spend too long searching--and perhaps gave up in the end.
- Can't always trust an answer
You don't know who posted the result most of the time, or the background of the person who put up the web page. Even if the information is formally correct, you might have trouble judging if it applies to your situation.
Timeliness is particularly important, because once postings and web pages go up they tend to stay up even as the software changes. And with the low cost of storage, nobody seems concerned with reclaiming disk space any more.
- You don't know what you don't know (not only is the answer hidden, but the fact that it is hidden is also hidden)
This is the most subtle of the problems. Many people don't identify what they're doing as problematic. They never think to look for alternatives to what they're doing, and wouldn't know what to ask.
For all these reasons, traditionally, some type of formal documentation is needed to get you started. Learners trace their own unique paths through a rich learning environment. Perhaps they begin with a mailing list. Having learned of good web sites to read, they come back to the mailing list with better questions to ask. They may also pick up a published book or take a seminar along the way.
This section describes in somewhat abstract fashion what makes good documentation special. It brings the discussion more down to earth.
By pace I mean giving users what they want, when they want it. An example of successful pacing was pointed out by a technical reviewer for one of our books who said, "Whenever I read a paragraph and had a question, I found the question was answered in the next paragraph." Now, it might have been even better if he did not have to ask the question in the first place. But clearly, this was a well-paced document.
Pace is very hard to achieve in community documentation, because few amateurs do it naturally and even professionals need the eagle eye of the editor at key points.
Audience is related to pace, because you have to understand your readers--what they know and don't know, what questions are on their minds, how their thought processes move--in order to hand out information in the proper order.
As with pace, it's hard for community authors to think about the needs of their audience. What I notice is that authors tend to write for other people just like them. This can work well; the author just faces the formidable tasks of finding representatives of each audience for whom he or she wants a document, and then motivating these representatives to learn the technology and write about it.
By background I mean something more than mere theory. Many people can write thousands and thousands of words on the theory of some topic--IPv6, for instance--and just bore everybody to tears. Rather, good background yokes the theory to the immediate needs of the reader. It makes clear why certain theory has to be understood and trains the reader to apply the theoretical concepts to what he or she is doing.
This kind of background is the hardest element of technical documentation, and is rarely found in community efforts. I spend a major chunk of my editing efforts explaining to authors what background they need and how to integrate it with their work.
Structure is similar to pace, on a larger scale. It involves such choices as putting the basic tasks one needs to know before the more complex tasks that rest on them.
Structure is kind of shot when you go online. You find dozens of unrelated documents with no indication of which to read first. Some documents try to solve this problem by helpfully organizing links into a reasonable order. I'll examine some solutions later in this talk.
Luckily, we are learning to live in a less and less structured world. Open source projects depend on loose associations among trusting people. Businesses are devolving and spinning-off functions. Even the military is getting less rigid. If we can tolerate less structure in life, perhaps we can tolerate less structure in our information. (But one audience member suggested that in this situation, we need even more structure in our information.)
In summary, good documentation provides the big picture. We'll see exactly what it offers in the following section.
Questions Answered by Good Books
- What range of problems does this technology solve?
The book does not say simply what a technology does. It indicates where it is useful and where it is not.
- How do different parts interact and alter each other's behaviors?
This is a matter of seeing the big picture, as mentioned earlier. Many topics don't work well when considered only in pieces.
A well-known example of a topic requiring a holistic view is security. You can fix individual parts of a system's configuration, but unless you consider them all as a whole, you'll probably leave holes.
Another example is performance tuning. Changing individual parameters of a system in isolation is like tuning a guitar by changing each string without comparing it to the other strings; you won't get anywhere good.
- What are the strengths and weaknesses of different solutions?
Like other questions, this one rises above the consideration of an individual topic and involves a comparison of several.
- What am I responsible for once I adopt the technology?
This is particularly interesting, because people who take on a new technology find themselves in a role that may require unexpected tasks. For instance, suppose you get a book on putting up a web site for your organization. You may become responsible for a number of related, critical tasks, such as security and maintaining the web pages of that organization, which in turn may lead to running other software such as content management systems.
- How do I lay the groundwork for flexibility and reliability as my system grows?
For instance, how do I write maintainable code? How do I design a system that can grow as my organization grows?
A Step Toward the Future: Safari Bookshelf
Safari Bookshelf is currently O'Reilly Media's main venture into the new world of online and dynamic documentation. Safari, a subscription service offering books in HTML, was launched in July 2001 and soon became profitable. A number of other publishers, listed on the web site, have joined. And there is little else like it in professional documentation. A few services offer books in electronic format, but Safari offers a unique combination of several elements that make it particularly valuable:
- It's in HTML, making it easy to view under many different conditions, search text, and copy examples.
- It focuses on computer and related technical documentation.
- It includes a sophisticated search interface with fields for title, author, publisher, and so forth.
- Most importantly, it offers the same high-quality, professionally written and edited text as the books from which it is drawn.
But O'Reilly Media, and in particular the developers of Safari Bookshelf, know it is not the be-all and end-all of online documentation. Because its material is identical to the printed books, it is not as useful online as a document that is designed from square one as an online document. (Tim O'Reilly talks about this in O'Reilly's E-Book Strategy.)
The online versions certainly take advantage of the medium in some simple ways, such as turning references to other parts of the document or other documents on the Internet into links. We have gradually added enhancements over time to further exploit the online medium. For instance, annotations are allowed, and you can let other readers see your annotations. This adds an element of community participation that we would like to increase.
There is lots of work to do on Safari Bookshelf, and lots of potential. As we earn more money on it, we can invest more in it. Other publishers can push the process along by starting competing services or--better for everyone, in my opinion--joining Safari Bookshelf and pushing us to speed up our development.
Potential Future Roles for Publishers
Given the pressure for more online documentation, there are many indications that the publishing industry will evolve radically, and that publishers--like movie studios, music studios, newspapers, and other content providers--will have to find new business models over the next decade.
I evaluate the changes that publishers could make by dividing what we do into two parts: what happens before the book is published, and what happens after.
- Pre-publication support for authors
This includes editing, layout, art, indexing, and technical review.
What may happen is a reversal of the current situation, where publishers take control of books and contract out to authors to write them. Instead, authors may keep overall control and contract out to publishers for particular tasks. They may say, "I need figures" or "I need a proofread."
- Post-publication support for authors
This includes publicity and obtaining book reviews. Such tasks are increasingly performed by the user community--just look at all the online book reviews, which have a notable impact on sales--but publishers have a lot of expertise here and may well be appreciated as expert mediators.
In this section, I suggest ways that project leaders and other members of software communities can improve the education they offer through online documentation and fora such as mailing lists. The topics are:
- Urge Active Community Participants to Become Formal Contributors
- Incorporate Professionals into Community Documentation
- Nurture New Users; Don't Repel Them
- Point People to Documents, Both Professional and Community-Based
- Enhance Rating Systems
- Ancillary Failings of User Education
- Who shows a tendency to post a lot, with insight?
If you run a software project, or maintain a forum such as a mailing list for that project, stay on the look-out for people who post intelligent responses to questions and seem interested in writing up what they know. Ask them whether they'd like to do something more lasting and substantial. (Publishers sometimes find authors that way.)
- Find out what motivates each writer
Why are people posting answers to questions or writing web pages about topics? Some may be consultants or trainers looking for clients. Others may just love the technology and want to see it more widely adopted. You can use these motivations as leverage.
- Offer rewards for writing
But it would be good to find money as well. This raises the question of professional involvement, which I'll discuss later.
- The Wiki
Wikis are community web pages edited collectively; people can add, delete, and change whatever they want (although changes can be logged to control malicious defacing). I can't say much about Wikis because they are new, but several impressive projects--notably, in the case of computer documentation, the Linux Wiki--show that Wikis should play a role in the process of creating community documentation. Some sites collect information through a Wiki, reject what they don't like, and organize what they do like into something more formal. But I don't see how Wikis can reproduce the traits of good books I discussed earlier, such as pace.
- Editing, design, pictures, indexing, etc. are expensive
It's worth getting these services from people who do them thirty-five hours a week or more, and have done them continuously for many years.
- Currently there are only a few rigid ways to profit from contributions
Unfortunately, unless one writes a Text, one has trouble getting remunerated for this work.
- Distributed payment systems are still just thought experiments
There are many interesting proposals for systems where people contribute micro payments into online funds and some committee (perhaps elected) disburses them to worthy projects. But these have not progressed beyond the proposal stage.
- Whittle down what is needed to the point where authors can afford professional help
As mentioned earlier, authors may take control of the writing process and bring in professionals sparingly. It's often valuable for an author to consult with an editor (and a review committee) at the very beginning of a project, to determine what sorts of background are needed and how to organize a document.
- Sponsorship has precedents
Computer companies are investing an impressive amount of money in open source software. If documentation projects are self-organized and produce good results, they should qualify for funding too. This may seem impossibly idealistic to people who are used to seeing documentation scorned and starved, but there are precedents for it. Many companies, especially during the dot-com boom, would allow their employees to buy books related to their jobs and submit expense reports. What is this but an indirect subsidy of the publication process?
- The community has to take responsibility for each member's learning
Every user has to count. Just think: the people asking questions on your mailing list may be the ones you want to hire six months from now. (When I gave this talk, one audience member said something even scarier: the people asking questions on your mailing list may be the ones who hire you six months from now.) Undoubtedly, some will be ungrateful, some will be a time sink, some will never learn--but you must give every person a chance until you know what he or she is like.
- Dump RTFM from the ammunition bag
The person asking a familiar question may actually have read the manual. He or she may have read the README file, and the frequently asked questions list, and lots of other stuff--but just not realized that the answers in those things pertained to the question at hand.
- Encourage active learning in a more positive manner
It may indeed be necessary to encourage users to read more documentation, but it can be done in a respectful and supportive way. Some users need training in how to benefit from documentation.
- Many useful explanations are buried in newsgroups, etc.
The problem of structure, which I mentioned earlier, has to be addressed much more formally than the community has up to now. Sometimes a knowledgeable user posts a valuable piece of information, and it gets buried in an archive. Project leaders should recognize these valuable postings, extract them, and turn them into something such as web pages with a more robust presence.
- Create flexible pathways through documentation
People would be very grateful to know what to read first. As a corpus of documentation builds up, someone could contribute a lot just by writing a web page that explains what type of audience each document is for, and what order the documents should be read in.
- Make use of professionally developed documentation
I seem to be obsessed with this topic. . . .
- The volume of documents will be overwhelming
As mentioned earlier, web pages and online postings tend to stay up forever once they are created.
- A guide to the guides
Some kind of portal may ultimately be the best way to provide documentation when there are so many sources and so many tiny contributions.
One project with a massive amount of contributed documentation, the Plone content management system, is trying a strategy of creating an outline for what the developers consider the ideal online documentation. The ultimate size could be thousands of pages if printed. The strategy should help organize existing documentation and at the same time encourage users to write more, because they will know what is needed and how it fits into the whole.
- Let readers rank documents; collate their votes
This proposal is by far the most ambitious in this talk, to the point where we have no idea how to implement it. But if it becomes feasible, it can solve many of the problems in the previous section. While many specific rating systems exist--Slashdot, online book sites--these are hard to generalize into something that works for arbitrary collections of documents.
- Admittedly, reputation doesn't work well in the online world
First, it's hard to motivate people to rate documents. In doing so, it's even harder to avoid offering perverse incentives. Systems that "rate the raters" just beg the question at a higher level.
- But reputation doesn't work so well in the real world either
Everyone knows the experience of going to a concert or restaurant that was highly recommended--and hating it. The world comes with a great big NO WARRANTY sign. So let's try to implement online rating systems, and use them to the extent that they are useful.
- Willingness to tolerate bad advice varies with the subject matter
Computers provide an almost ideal subject matter for community advice-giving, because if somebody gives you bad advice, you usually experience consequences no worse than throwing away the recommended file or rebooting your computer. Compare this to the needs of one audience member during my talk, who actually searched for online advice about how to fix her car. Luckily, because she found the advice on the manufacturer's web site, she was confident it would not be dangerous.
I have also received reports that bad advice can mess up a computer system to the point that it's almost impossible to recover. One person who told me a story of this nature said in anger and frustration, "Writers should be held responsible for what they write." I am not comfortable with the constraints on free speech this implies, but the issue highlights the value of ratings.
The points in the previous sections were intrinsic, I think, to the process of community documentation. A few other failings are less inherent to the process but are important and should be noted.
- English is favored
While lots of online fora and documents exist in other languages, people wanting information on computer use generally still have to know English to get the latest, fullest, and best information. Many projects (such as the Linux Documentation Project, the Free Software Foundation, and the GNOME and KDE desktop projects) are working hard to translate documentation and create alternatives in many languages.
- Cultural differences aren't respected
The previous failing was just a specific instance of a broader problem. There is an implicit culture in most online groups. People discussing computer topics tend to expect that readers will understand the concepts used, will ask questions when they need help, and will stand up for themselves when criticized. But many people have more hierarchical expectations; they may need hand-holding or want to find out who is considered the local authority. They may merely want people to be polite! They should not have to give up their cultural norms just to get information.
- Different learning styles aren't respected
The percentage of drop-outs in the traditional K-12 educational system has dropped drastically over the past few decades, even though money is very limited in the school systems. The reason for their success in this area is that the field understands much better than it used to that people have different learning styles. In the computer field, we too can do better.
- Gaps, haphazard coverage
Whenever one depends on contributions, one inevitably will end up with a glut in some topics and a deficit in others.
- Think of ourselves as a community
We are all responsible for educating each other.
- Leverage what we have to offer through better organization and rating
Create portals, show people what to read and what order to read it in, and give indications of what the most popular documents are.
- Make use of professionals
Professionals offer many advantages once projects find the money to pay them. Try to make use of skills such as editing and design, and existing high-quality documentation such as Safari Bookshelf.
- Encourage new users and respect diversity
New users represent our future.
Return to opensource.oreilly.com