Marshall McLuhan vs. Marshalling Regular Expressions
Pages: 1, 2

Regular expressions extend the reach of text, and therefore inexorably change how we sense the text.

It has been said already in this article that regular expressions are supremely a textual medium. They represent the complete conquest of text. They become a world contender when fortified by Unicode, which is now supported by most computer languages as well as regular expression packages to some degree. Unicode gratifies the alphabetic print culturist's ultimate fantasy by regularizing all linguistic expressions in ordered, discrete abstractions.

The renewed importance assigned by computer programmers, perhaps surprisingly, to the old medium of text reflects the intrusion of the Internet into a field of electronic media previously focused on entertainment. The Internet has raised the retrieval of textual and numeric information (such as news, weather, and financial data) to a mass phenomenon.

Like calculus (which McLuhan considered a conquest of the tactile area of numbers) regular expressions anticipate the unpredictable and bring repeatability to the immeasurable. A simple * (which means "zero or more of the preceding item") compresses everything from zero to infinity into a calculable scheme.

But let us look more closely at this *. It challenges the precision of text. It is neither an A nor a B, and therefore cannot be found in Gutenberg's box of type. Its location cannot even be fixed.

McLuhan writes in Understanding Media that "the clock visually separates time from space." But the computer's millisecond-driven clock destroys all time on a human scale in "the electronic age, which found that instant speeds abolish time and space." In the same way, while text parses and subdivides thought, * dissolves and absorbs all text. Gutenberg separated oral speech into figure and ground, but * combines them again. Like the electron in its post-Newtonian atom shell, * ranges freely and resides nowhere.

With traditional text compilation (using yacc or similar tools), text is parsed in iterative steps. It is broken down into the smallest possible atoms called tokens and processed in figure/ground fashion with an intense attention to the relationship of each token to its context--the classic scientific method promoted (according to McLuhan) by print culture.

Used tentatively, as a beginner would use them, regular expressions may seem just an added convenience to the traditional lexicon of token-processing tools. A garden-variety use of regular expressions might be "extract the text between the fourth and fifth colons in a line," a perfectly natural operation that, for instance, can obtain a user's real name from a Unix system's password file.

A reader of Friedl's book may well begin it with such tasks in mind, but a world-altering shift in thinking takes place by the time he or she progresses beyond the third chapter. It may occur gradually and intermittently, because Friedl takes care to present it through quiet demonstration and example, gingerly pushing forward the reader's transformation from different angles--but it definitely occurs.

Used to their fullest, regular expressions ignore figure/ground. They operate holistically. They swallow the entire text--sometimes tens of thousands of characters in one fell swoop--and create an impression of it. When you are processing a concept like "find a quote-delimited string, but not where either quote lies inside a comment," the result is a function of the whole text, not of individual characters.

Gutenberg set his type one character at a time; regular expressions combine characters into their conceptual functions. You can extract an XML tag by entering <[^>]+>, which appears to fulfill the print-culture's goals of isolating and dissecting an object. But <[^>]+> is fuzzy, matching any XML tag rather than a fixed sequence of text.

The elusive tension between print-culture analysis and electronic-culture holism gives Mastering Regular Expressions its power to intrigue. The book itself celebrates print culture in a myriad ways. The writing is precise enough to reward careful readers and to prepare them for the dual job required by the technology: to analyze the effects of each regular expression and to analyze the text which it is parsing. One must possess a print-culture's training to compare (\d)+ to (\d+) and determine the differences in their side effects.

As another sign of its obeisance to print culture, Mastering Regular Expressions digs into every available cranny in the print-maker's toolbox. Fonts, special characters, and page layout are all put to work; Friedl's mastery over the dominant sense of a print culture--the visual sense--is evident.

But even in his superficial concerns, Friedl departs from McLuhan's characterization of print culture. When he explores the meaning of uppercase and lowercase, the question of whether the accent on the é in cliché is integral to the é or separate from it, or the problem of recognizing a space character among its "dozen or so" different Unicode representations, these are not truly literary concerns.

An interest in typography is not the same as an interest in text. Members of print culture are interested in a passage's sense as abstracted from its appearance. Just as printers believed they were preserving all aspects of text as they transferred it from manuscript to plates of type, a member of the print culture gets impatient discussing a space character.

Friedl stresses both the scientific method and a less formal feel for context, calling the marshalling of regular expressions an art. It's not important whether his readers get the precise meaning of every sentence, because understanding comes on gradually over the course of many pages of examples, tests, and metaphors.

His repeated admonitions to pay attention to context--to the ground that supports the figure--may prove irritating to someone trapped by print culture, wanting the figure to be isolated from its ground, fixed, and broken down to atoms. Friedl's process may, in contrast, conform naturally to the expectations of someone who swims in electronic culture.

Thus, regular expressions confirm the thesis presented by McLuhan in Laws of Media: "When pushed to the limits of its potential...the new form will tend to reverse what had been its original characteristics." Text in the age of regular expressions reverses its most fundamental characteristics of division, isolation, and specialization. McLuhan would have been enthralled with regular expressions because they expose the whole within the parts. Their painstaking pursuit and cataloging of individual, discrete alphabetic characters leads to the dissolution of the figure/ground distinction that McLuhan attributed to alphabetic text.

What will the emerging culture look, feel, sound like?

The success of Mastering Regular Expressions should help assuage our McLuhan panic. We are not condemned to lose our reason and be caught up in a polyglot tele-babble of visceralism. We can have our media cake and eat it too.

McLuhan portrayed electronic media as an assault against reasoned choice. We swallow everything that comes across the radio waves; we can no more differentiate and filter television images than a newborn baby can distinguish what is put in its mouth. Infantilism reigns within mass media, as viewed from the vantage point of the 1960s. But with digital processing, we can become finicky eaters indeed. Now we analyze, we extract, we rotate and scale.

The key lies in choosing our tribe carefully. Instead of McLuhan's vision of a resurgent oral/tactile culture of television, we can embrace hacker culture. With the help of regular expressions and other digital processing, individuals can shape media into what they want.

Writing in the 1960s and 1970s, McLuhan can be forgiven for ignoring hacker culture. But this extension of human capability may, in classic McLuhanesque fashion, alter our relations with media and society. Hacker culture will attract malnourished seekers of oral community much more than the cynical cheeriness of television or the cell phones to which so many cling like a lifeline to community.

New media always make it easier for users to express themselves. That is inherent in their newness, for otherwise no one would bother to adopt them. The Internet extends the traditional human abilities to see, to speak, and to manipulate. The revolution is not so much one of content but of distribution. Computers allow the manipulation of old content and old media in unanticipated ways.

McLuhan says that media cause the world to change just as relationships between our senses change. And digitization certainly fits the model.

Given open standards, easy scripting languages, and cheap, versatile devices, digitization could allow users a degree of control over content never before imaginable in history. Conversely, given welded-case devices and access controls, they could allow the owners of content a degree of control over users never before imaginable in history.

A closed, unprogrammable device fits McLuhan's most dire assessment of automation and its numbing effect. But once a hacker breaks open the device and reprograms it, he reclaims not only the device itself but all media with which it comes in contact. We have seen the potential of new media. Let us now reach out and grasp it.

Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is

Return to the O'Reilly Network.