This is a final blog in a series that began with Part 1
and Part 2. In this series I’m presenting the results
of a research project to measure the effectiveness of two mailing
lists, which will be the start of what I hope to be a larger study.

How much noise is on the lists?

Figure 1 shows the breakdown of messages into the four categories I
defined in the Part 1. The precise breakdown is:

Category Number of messages Percent of total
Helpful 118 57
Irrelevant 54 26
New 28 14
Unhelpful 6 3

Categorization of messages by helpfulness

Figure 1. Categorization of messages by helpfulness

The low number of unhelpful messages is encouraging. But the high
number of irrelevant messages (even though some irrelevant messages
have value for list members in ways that don’t directly pertain to
solving technical problems) shows that participation in the list has
high overhead. And indeed, the volume of irrelevant messages forms a
common complaint among users of mailing lists.

Most lists contain off-topic threads, as well long threads about
non-technical issues such as upcoming conferences. As I indicated,
non-technical messages may be valuable for reasons of their own, and
if their subject lines are clear (not always the case) they can be
avoided be people who are uninterested in them.

But many technical threads also contain irrelevant messages. As one
example, a new user posted a question about hardware and was receiving
useful feedback when someone complained that he hadn’t asked the
question the right way. This led to a long series of messages about
the right way to ask the question, dragging along more and more
acrimonious accusations that the complainer was not being nice to
newbies. This exchange, involving a lot of correspondents, created so
many irrelevant messages on the thread that I had to be careful in
interpreting results to avoid letting it skew them.

There is no doubt that the exchange was distracting and wasteful. The
person who posted the original message took up time reading the
exchange; I know this because he threw in his own opinions a few times
and tried to justify the way he had asked his question.

Incidentally, this was one of the resolved threads—that is,
despite the noise on the thread, the answer to the technical question
eventually came. I thought the original correspondent would be scared
away forever, but in fact he was back the very next day with another
question. In one sense this is good: he stuck with Linux and with the
list. But in another sense it’s a bad result, because it shows he had
not learned enough about his system to solve his problems by
himself. I will explore the question of reader education in the
conclusion to this article.

How many references were offered to external sources of information?

A key goal of any mailing list should be to wean users from the
list. New users need help to find information, and even experts are
sometimes stumped, but every user should find himself or herself using
the list sparingly and should strive to become adept zt solving
problems without help. Another way to state this is that users should
evolve from questioners to respondents. This philosophy highlights
the value of pointing list members to outside sources of information.

Most of the 28 questions I recorded were specific and detailed enough
to show that questioners had worked quite a bit before resorting to
the list, and had made good-faith efforts to find information on their
own. Even so, one assumes that many questions are answered in release
notes, bug reports, project web pages, and other places that may be
hard to find. Therefore, one would expect answers to point to outside
documentation.

A simple check for references to URLs, as well as to books and to
traditional Unix documentation such as manpages and info pages, shows
that a modest amount of referencing occurs:

References to web pages or other URLs: 23

This is a decent number of references for a sample of 28 threads.

References to man or info pages: 6

These are traditionally found on the computer system along with the
software they document. The disparity between references to URLs and
references to this offline documentation shows that documentation is
moving online, where it can be updated dynamically. As evidence of
this trend, manpages and info pages can usually now be found online as
well as on the local computer system.

References to printed books: 0

This result gave me pause as a book editor in the Linux space. I was
not surprised by it, though, because in several years of sampling the
Linux-related lists I have never seen anyone recommend a book. Once or
twice a questioner explicitly requested advice on what book to get,
and received one or two replies. But there seems to be a macho
attitude in the Linux community toward solving problems through
experimentation and consultation of quick-reference material. On other
lists, I’ve seen many more recommendations for books.

The mere presence of references does not mean that the references are
useful. But they show that mailing lists are part of a larger learning
environment, and to some extent the members recognize that. However,
interaction with external references is very limited. People who post
references do not generally help the readers understand what to look
for in the documents or how they apply. I will expand on this point in
the conclusion.

Conclusion: the role of mailing lists

Mailing lists can be expeditious sources of information and can build
community. But the research behind this article suggests that mailing
lists fall short sometimes, and involve some inherent inefficiencies
even when they succeed in providing answers.

When someone comes to a mailing list, there are three possible
causes—three types of information failure:

  1. The information has not been written anywhere. This is a common
    problem, despite the vast amount of written computer
    documentation. There are many subtleties in complex software that have
    gone unexplained. Information may also be missing for bugs or quirks
    in newly released software.

  2. The information has been written, but it cannot be found. This
    happens because web searches are not perfect, and few projects provide
    well-organized collections of pointers to the relevant information
    stored in idiosyncratic places.

  3. The information has been written and can be found, but the user does
    not know how to find it. This is part of the larger education problem
    I have been discussing.

In the common case where someone is entering a command with the wrong
options, and someone provides the right ones, one might classify this
as the third type of information failure. After all, the questioner
had access to the documentation but misinterpreted it.

I’d like to reframe the viewpoint, though. If someone misinterpreted
documentation, he lacked the background knowledge to understand it. I
think this is a failure of the first type. What he needs is
documentation that helps him place the command and options in the
context of his needs. He needs background documentation. This is the
hardest type of documentation to produce, because mere academic
descriptions of systems offer little to most users.

The need for background is hard to explain to computer users, and hard
to define as a goal. My greatest concern is that mailing lists provide
answers too quickly and too easily. The questioner
may be guided toward a broader and deeper understanding of her system,
but often she is just told what to do to solve her problem with no
breadth or depth. She has been given a fish for today, but tomorrow
she may find herself marooned without a sinker attached to her hook.

On the other hand, the solution should not involve requiring every
user to read thousands of pages of background documentation before
touching the system. We need to find a path that combines John Dewey’s
classic doctrine of “learning by doing” with techniques for building
mental models that guide users to solving their problems. Community
support through mailing lists and IRC channels can certainly play a
role. But we don’t yet know how this works, and it probably works
differently for every individual. I think mailing lists could do a lot
more.

First, list members could work harder to investigate the questioner’s
problem. Certainly, this is hard to do when they are at remote
locations. It would be interesting to experiment with the use of
remote login to let experts look at a malfunctioning computer
system. Trust would have to be pretty high for this to work, though. A
more feasible solution that is often seen on mailing lists is for
experts to suggest commands to enter and symptoms to look for. If this
could be formalized into a troubleshooting procedure linked to common
symptoms, future users could benefit.

Furthermore, experts could give new users not only pointers to
documentation, but guidelines for what to look for. Reading technical
documents is a skill that grows with technical knowledge

So a mailing list can play a role in filling a gap between the
documentation and the user’s understanding. But list members should
understand the importance of providing this bridge. Mailing lists must
be seen as part of an information ecosystem in which they are one of
the most supple and fastest-moving creatures.


The data for this study is available in the form of the original mail messages (a gzipped tar file) and results of database queries.