O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples

Thinking Outside the Outbox

by Andy Oram

It's easy to feel jerked back and forth by arguments over the Microsoft anti-trust case. Microsoft backers say the company has been enormously innovative; detractors insist it's held back innovation. It took me a long time to realize that both sides are right. The key is in how Microsoft sees the world and people's relationships in it.

Projecting the corporate persona

Microsoft is an astoundingly successful corporate culture: firmly goal-directed from the top down, adept at fostering creativity within the goals set by management, and intensely collaborative on the team level. (Some of these judgments I derive from a February 2000 Atlantic Monthly article written by leading journalist James Fallows, who spent six months at Microsoft. But my own intuition confirms what he reports. I think those traits are necessary for the kind of successful product development coming out of the company.)

Just as programmers tend to make tools that are comfortable for other programmers, some companies aim products at customers like them. And it happened that Microsoft struck it rich catering to companies with similar cultures and needs. That's not necessarily bad; I think O'Reilly & Associates is also a company that delivers to like-minded customers. But where there's essentially a monopoly, such an orientation becomes a liability for all.

Microsoft and the DOM

DOM, the Document Object Model, is a standard API that rigorously classifies and organizes a Web page according to its structural elements (tags, attributes, entities, and text strings). XML, the Extensible Markup Language, provides tags, attributes, and entities appropriate to the content of the Web page. So you can write a relatively straightforward program to reliably do anything you want with documents, so long as applications like Microsoft Office offer the DOM API and accept XML as a format. And if Microsoft restrains itself from breaking the standard, the documents can be freely exchanged with applications on other systems. For instance, the KDE free software desktop provides a DOM Level 1 interface, which they plan to upgrade to DOM Level 2, and uses XML as the storage format in its suite. Applications in the GNOME desktop also recognize XML documents and can parse them using DOM or SAX, a more light-weight API.

I've tried to summarize the Microsoft corporate personality. But as far as its products go, they took a long time to develop a personality. There wasn't much one could say about MS-DOS and early versions of Windows. Then the company pushed into the market for office software and found its true destiny. However, that destiny is shaped by the environment into which Microsoft insinuated itself. To wax a bit cynical, it sees its role as providing offices with electronically enhanced typewriters and adding machines, not to mention the inboxes and outboxes where bureaucrats of the world have always spent their hours shuffling papers.

I believe that the Microsoft world view dictates the kinds of innovation it puts forward as well as the types it shuts off. It hasn't thought beyond the Outbox. This theory has helped a lot of things make more sense to me, ranging from the success of Microsoft Office to the success of the love bug virus.

Innovation and its undertow

Don't waste time telling me that Microsoft has no ideas of its own and just takes them from other people; that kind of criticism doesn't hold up. Find me a single product feature in any working software package that can't be traced back to some obscure postdoc, who probably published the original idea in an IEEE journal read by 20 people. Microsoft didn't invent the idea of providing a well-defined structure for every document, for instance (among other places, that's in SGML, the progenitor of XML). But it still constitutes innovation for Microsoft to declare that every one of its office products would be based on a documented, exposed structure that could be represented in XML. Better yet, Microsoft component libraries and tools allow people to manipulate that structure through programs in a variety of languages.

Because of the standardization of structure, not only can programmers customize Word, Excel, and other Office software to the needs of their office workers, but output from one tool can be embedded in or fed to another tool seamlessly. Such flexibility may yet come to benefit us all, if the company doesn't muck up the basic DOM and XML DOM APIs with propriety extensions as they are so famous for doing.

I honor the advances that Microsoft has managed to bring into being, but what about the advances they've missed? What lurks in the blind spots at the retinal edges of Bill Gates's steady vision?

The problem can be summarized as this: Microsoft views end-user collaboration as part of a uniform, department-wide or company-wide system serving technically unsophisticated staff with no independent needs of their own, where the distribution of software is controlled by a benevolent system administrator who makes sure nothing buggy is sent out. (A strange vision for a company that reputedly broke the "glass house" mainframe model of computing.)

Such a view fits a typical corporate office. But it doesn't fit many new-style companies with looser webs of collaboration, or individuals in home offices working closely with a range of clients, or students, or artists, or people in the broadest sense. These represent a lot of blind spots.

That's why Microsoft didn't anticipate viruses. ActiveX and Windows Scripting Host were designed for airtight local networks behind corporate hubs. Subjected to the unpredictable, buffeting winds of the free-ranging Internet where most of us work, these technologies get blown away. But I wouldn't write this essay just to join the chorus of irate technology journalists and snipe at Microsoft for inflicting e-mail viruses on us. It's not too late for Microsoft to upgrade their security significantly, so their vulnerability to viruses could diminish greatly in the future. I mention the virus problem because it's a symptom of a much deeper loss.

Here are a few things that I wish we could do on our desktops, but that neither Microsoft (because of their corporate world-view) nor others (because of the hegemony of Microsoft) have yet given us.


When I sit down with colleagues at work, we scribble over each other's documents, rip out pages to compare them, snip them into pieces to reassemble them, and do all sorts of other unstructured collaboration. Wouldn't it be great if everyday office products allowed people to do this kind of work while sitting thousands of miles apart? Microsoft Word takes a step in this direction through the "revisions" feature (which I wish all software vendors would emulate), but it would be great for me to watch a colleague cross out a word and shout, "Stop! That's a specific technical term that has to stay put!"

There's nothing new about groupware. Doug Englebart provided an extremely sophisticated vision of it way back in his AUGMENT project of the early 1960s. The discipline of Computer Supported Cooperative Work (CSCW) has been a fixture of computer science research since the 1980s. Today, Lotus Notes embodies some of those insights. But collaborate computing of the fluid, brainstorming sort is still not commonplace.

Instead, millions of office workers send millions of bulky Word documents, spreadsheets, and Powerpoint presentations over e-mail day after day to their coworkers, who probably don't even want the damn things. Making a minor correction means sending the whole thing over another e-mail to everybody. The waste is enormous. I confess that once I put a certain regular mailing in my e-mail kill file, because I didn't have the heart to tell a coworker that I had no use for the documents he spent his life on.

Microsoft will go on adding features to its bloated office products forever, but never will it support the type of collaborative work that fosters sudden inspiration, in my opinion. This is because such support would require putting more power in the users' hands, leaving it up to the users to define their forums and their ground rules. The model is not top-down.

Collaborative software is hard to develop in any case. Others who have tried it have hit intrinsic technical barriers; I'm not blaming Microsoft for everything. Just checking multiple clients for updates to a common document, managing locks on resources, and monitoring the coming and going of clients requires a surprising amount of overhead. Authentication is also a big job. Especially for Microsoft, whose clumsy handling of Kerberos shows that it's a newcomer to the authentication game.

In short, we have a long way to go before collaborative work can be a part of the kind of robust, easy-to-use software that is ready for the masses. But the dominance of Microsoft products will hold it back for an indefinite amount of time.


People who collaborate do so with all their senses and faculties. We jabber at each other, put our arms around each other's shoulders, and scrutinize each other's faces. Researchers in human-computer interaction (HCI) have long stated that computers will be severely limited until they can recognize and support a wide range of modes of behavior. For me, interacting with a technical instrument is not so interesting as interacting with other people through the instrument, but in both cases the computer needs a lot of capabilities that are only in laboratories now.

We can appreciate what Microsoft has done for accessibility (adding features at the behest of blind users, for instance, so that they can interact with a graphical interface), and they are also reportedly working behind the scenes on voice recognition. Barking an order at your File menu, however, is still no progress toward an integrated system that allows voice, gestures, and other everyday means of interaction. Not even a playback technology like DirectX comes close to such a system.

There's no barrier to Microsoft's providing a breadth of interactive functionality, but I just don't see them being the company to do so. They aren't tuned into telephony, robotics, and other technological areas where innovations in these areas are likely to emerge. Why should any one company be strong in all areas? What we need are opportunities for experts in such areas to offer products to the general public. But anything they do will be marginal so long as most people rely on Microsoft products. Maybe we need HCI technology that notifies us of our blind spots.


Version or change control is another common software practice that Microsoft doesn't understand. If I'm going to get a weekly mailing of a spreadsheet, I'd like to be able to find out quickly what's different this week. I suppose I could write a script to do so, but the software should really provide it for me gratis. Or I could ask, "By what percentage did such-and-such a product change in sales this week?" Cross-comparisons are also valuable: "What else changed this week that could have had an effect on the product?" And I'd like to know what people are thinking: "How has each stage of review changed the way we state a sensitive point?"

Change control systems also allow people to add meta-comments about why things changed. Microsoft Word offers lots of great meta-information: the outline and document map (although I find they're not very helpful unless the writer was thinking about them while writing), revision control, embedded comments, and so on -- but they don't add up to a sense of history about the document. History, of course, is also an important part of collaboration.


A lot of creative work is messy. I hope that the next generation of end-user software provides some structure within which users can make a mess and then clean up. For instance, it would be nice for a graphic designer to be able to send a client a bundle of graphics and text with three alternative layouts that the client could view at the click of a button.

Documents also have multiple audiences and, therefore, a reason to show the same (or related) information from multiple angles. One statue, many faces. The most familiar face that people add to their document is the executive summary, but there can be many others. Different people come in at different levels looking for different things. I know Microsoft is obsessed with the visual look of documents, but I don't see evidence that they're concerned about what's being looked at.

To sum up what I feel is missing from productivity software, I see a document as an evolving, multi-faceted work. It has lots of contributors, it can be approached with a variety of senses, it has a history, and it is viewed by different people with a variety of goals in mind. Real-time collaboration, perceptual interfaces, version control, and flexible display should be mainstream.

Is there an alternative?

It's commonplace nowadays to point to the Internet's promise of instant access as the driving force behind the need for flexibility in modern work. It's also commonplace to say that Microsoft "doesn't get the Internet." It would be hard, though, to point to a company that really does "get the Internet." Even BBN, who invented the Internet, never seemed to really "get it"; for decades their network division stagnated under the management of three different companies until it was spun out to become a new company called Genuity, as part of a corporate deal to merge more lucrative telecommunications divisions.

Internet collaboration is still so new that few organizations really understand it even while they're doing it. O'Reilly & Associates, for instance, works with a number of experts outside the company proper who participate in discussions and development on a weekly basis; they're on our mail aliases and get to stick their fingers right into lots of our software. This is a very modern way of working, but we don't have any formal computing structure you could call an extranet, nor do we know how to represent these crucial helpers in our technical administration and personnel policies. Furthermore, while staff throughout the company work closely together on book text, graphics, design, and marketing materials, we depend far more on Federal Express than a high-tech leader should have to.

If any computer manufacturer "gets it," you'd probably have to say it's Sun Microsystems. And Sun does have a plan for changing how we use our computers. But it doesn't have anything to do with our current office products. While Microsoft was building a business on office workers, Sun made their money providing servers. Naturally enough, Sun sees a future dominated by powerful servers, which interact through Java and JINI with tiny embedded devices, smart cards, and telephones. But that's a long-term plan, and even though it may start to come true within a few years, many more will pass before it reaches the desktop where I'm typing right now.

So we are caught between past and future. The past is represented by the enhanced typewriters and adding machines that we work with in our offices, exchanging our creations through the Inbox and Outbox of Microsoft Outlook. The future is a wireless world where I can turn down the thermostat at home by speaking into my cell phone. (You won't see me put one of those things next to my head, but I'm willing to try a model with a detached ear piece.) I don't want to be stuck with the past until the future arrives.

How to promote Internet-savvy software

It's interesting that, in the Microsoft anti-trust case, Judge Thomas Penfield Jackson has suggested going further than the Department of Justice wants to go and splitting the Internet side of Microsoft's business off from its office productivity suite. Since the Department of Justice has not asked for this, I don't expect the suggestion to survive into the final ruling. Nor do I have either the legal or the technical knowledge to take a stand on the anti-trust case. But I find it interesting to speculate on how such a break-up could advance or retard the integration of Internet-savvy collaborative features into our day-to-day tools.

Microsoft thinks in terms of integration. It claimed that one of its great innovations was embedding a browser in Windows Explorer. By this type of thinking, it would be folly to force office tools to be developed in a separate company from Internet tools. The thrust of this essay suggests that one should consider the office and the Internet as one of a piece. For every feature developed in every tool, designers should be asking, "How does the Internet affect our choice?" And the idea that you can define one feature as an office productivity feature and another as an Internet feature is absurd.

But I wouldn't jump to the conclusion that Jackson's suggestion is wrong. Suppose the developers of office suites knew they couldn't use TCP/IP in their products but that TCP/IP access had to be supported and enabled? (If they just ignored remote connectivity, their products would fall into the gutters by the side of an evolving marketplace.) Suppose that, instead of stuffing some collaborative tools in by kludging XML here and URLs there, they tried to make their products attractive to Internet development companies by providing the tools for extending the software? To do this, they'd provide well-documented APIs and adhere to standards.

That's what the development community is really asking for, isn't it? Standards, documented APIs, and a chance to compete? Perhaps a break-up would draw in the wild ideas of ten thousand new talents, filling the vessels provided by the most popular applications the software industry has ever seen.

Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.

Discuss this article in the O'Reilly Network Foru m.

Copyright © 2009 O'Reilly Media, Inc.