Are VHLLs Really High-Level?by Greg Wilson
Now contrast that with the programmer in the next cubicle. She is probably sitting in front of a glass TTY like emacs or the Visual Studio editor, transcribing the hexadecimal addresses presented by a debugger to create a hand-drawn box-and-arrow picture of an improperly-linked list. There is no simple way for her to embed that picture in her source code. She will therefore probably have to re-draw two weeks from now, when she realizes that the recursive makefile she was using wasn't setting debugging flags properly in the sub-sub-directory that contained the source for the linked list class.
Once you open your eyes, it's hard not to believe that programmers want to change everything except the way they themselves work. One sign of this is the silent veto most programmers have given to better working practices, even though there is overwhelming empirical evidence that these practices would make them more productive [McConnell 1996].
Another sign is the subject of this article. Over the past few years, I have done several medium-sized projects using both Perl and Python. At first, I was very excited by what these Very-High Level Languages (VHLLs) let me do, and how quickly. The more I played with them, however, the less satisfied I was. In particular, I no longer believe that they deserve the 'V' in their name. This article explores why, and suggests some ways in which they could evolve.
USENIX 1994]. Among their common features are:
In addition, many VHLL libraries are written by wrapping code written in a lower-level language (such as C) with a higher-level interface.
Much of today's thinking about VHLLs can be traced to the Unix tool AWK [Aho 1988]. Its authors stated explicitly that AWK was designed to support the filter-style programming of most common Unix tools. Line-oriented input and output was handled automatically, as was pattern-matching on those lines. Memory was allocated and deallocated as needed, variables didn't need to be declared, and both dynamic arrays and dictionaries were built into the language. In short, AWK handled all the common problems involved in building a generic Unix filter, and left the programmer free to concentrate on issues that were specific to a particular filter.
Twenty years later, AWK has been superseded by Perl and Python, Tcl and Visual Basic have evolved to do for GUIs what AWK did for streams of text, and Scheme (which predates AWK) has retroactively been added to the same family of languages in many people's minds. Like AWK, all of these languages handle many everyday issues automatically. Because they are so often used to drive other software (using shell calls, wrapper libraries, or COM), these languages are often referred to as "scripting languages". Many tasks that were once done using shell scripts are now done using Perl or Python, as are tasks such as CGI scripting that would have been impractially difficult with /bin/sh and its offspring.
However, the rest of the programming language world has not stood still in those twenty years either. C has largely been replaced today by C++ and Java, both of which let programmers work at a much higher level of abstraction. As a result, the gap between "production" and "scripting" languages has become much narrower--so much narrower that it is no longer clear exactly what the added value of the latter category is. Built-in dictionaries? Java has them, and so does C++'s Standard Template Library (STL). Regular expressions? You can get at them through a library interface as easily from C++ or Java as you can from Python. Automatic memory management? Many C++ libraries use some combination of overloaded assignment operators and reference counting to take care of things just as well (or rather, just as poorly) as Perl and Python. Java, on the other hand, has real garbage collection, so that programmers can build graphs, or pass callbacks around, without worrying about the possibility that they are creating circular references.
In fact, the only two advantages that so-called VHLLs still have over their "merely HLL" counterparts are dynamic typing and interactive interpretation. Many experienced developers question whether the first is really a good thing: for everyone who believes that strong typing is a crutch for people with weak memories, there is someone else who argues that strong typing helps catch, or prevent, many errors, and thereby reduces program development time.
Similarly, a modern development environment like Microsoft Visual C++ is effectively as interactive as the Python command line. If I make a three-line change to a C++ program, then press F5 to re-compile, re-link, and run the executable under the debugger, I'm back into my debugging session faster than if I make a similar change to my Python script and re-run it. And yes, it is handy to be able to interrupt a program, change the implementation of a method, and then have the program continue, but I simply don't believe this makes much difference to overall development time once programs grow above a certain (relatively small) size.
I therefore think that the future development of VHLLs should be guided by the answers to two deeper questions:
However, modules built in this way are almost guaranteed to be as low-level as their starting points. As a result, while modules written on a clean sheet of paper tend to be noticeably higher-level than modules based on legacy libraries, the latter tend to colonize their respective ecological niches first, and thereby prevent the former from ever evolving.
For example, compare Perl's widely-used HTML generation and scripting module CGI.pm with its relational database interface modules. The former encourages programmers to manipulate an abstract tree representing hypertext entities, and easy conversion between that representation and others (such as arrays of list elements). The database interfaces, on the other hand, require programmers to do most of their work by constructing strings that (hopefully) consist of legal SQL. Such "programming with sprintf" is the VHLL equivalent of goto statements, but seems unremarkable to many programmers because "we've always done it that way".
As another example, compare make and Cons. While make is probably the most widely used auxiliary programming tool in the world, its mish-mash of declarative and imperative syntax makes even Perl look readable, and it is very poorly suited to large projects. Cons [Sidebotham 1996], on the other hand, is a Perl module. Customizing the build process for a particular project is a matter of passing some file names as constructor arguments, or overriding some methods, rather than learning a complex, arbitrary, and specialized syntax. What's more, integration with other build activities (such as running regression tests) is much easier.
"Oh, well," you say. "Many programmers still use emacs or vi, or some other ASCII editor. If you allowed WYSIWYG source, they wouldn't be able to read each other's programs. And most programmers don't know enough about programming languages to make heads or tails of lazy evaluation, type inference, or exotic concurrency mechanisms." The first point is true, but the fact that a few people still use lynx as a browser doesn't stop the rest of us from putting image maps in web pages. As for the second, the speed with which the C++ community has adopted generic programming (in the form of templates), and the degree to which multi-threaded programming is now taken for granted (in Java) leads me to believe that the average programmer is actually pretty smart.
Reviewers of early versions of this article tried to explain the conservatism of VHLLs in several ways. First, some reviewers felt that the VHLLs we have are good enough, and that greater improvements in productivity would come from better libraries and supporting tools, rather than yet more syntax. I agree that the real need is higher-level libraries, but think that languages such as Linda, Haskell, Icon, and J have proved that the incorporation of certain language features makes it much easier to build libraries in some domains.
A refinement of this argument was that while new language features might be useful on their own, their overall effect would be small, or even negative, because they would complexify the language. The addition of references to Perl, for example, made an already hard-to-read syntax even worse. One reviewer pointed out the way in which "simpler" scripting languages, like VBScript and PerlScript, keep appearing, then growing, until there is room underneath them for yet another "simpler" language to appear.
I think the solution here is that language designers need the courage to throw things away. Every Python tutorial I have read, for example, devotes a few paragraphs to justifying the existence of tuples. Their functionality is a strict subset of the functionality of Python lists--why not bite the bullet and get rid of them? Similarly, Python's three-level memory hierarchy, and its requirement that the body of a lambda can only be an expression, makes it needlessly difficult to write applicative programs. I believe that Scheme and other languages have proved that applicative programming is as powerful, as general, and as comprehensible an abstraction as object-oriented programming. Why not upgrade Python to let programmers take full advantage of it? Similarly, it must surely be time to take picture-format output out of Perl.
A subtler argument is that even if significant improvements in today's tools are possible, programmers are too overwhelmed by other changes in their environment to take advantage of them. One reviewer said that even if templates were added to Java, he'd be too busy learning SWING and Enterprise JavaBeans to figure them out.
The only long-term answer here is education. I believe that one reason for Java catching on so quickly as an educational language (it is now used in 80% of first-year college computing courses) is that its very conservative design contains little to frighten a generation of professors raised on Pascal. Students who aren't exposed to other programming paradigms as undergraduates will often exercise a silent veto later on by not adopting them, even when they are the best solution to the problem at hand.
VHLLs can help our profession get out of this trap by highlighting which concepts are worth learning. While some university professors have told me that they don't believe there is any industrial demand for Perl programmers (no, I'm not making that up), most feel growing pressure to make the content of courses more relevant to the real world. At the same time, the incorporation of type inference or programming by contract into a language like Python would do a lot to make it more academically respectable.
Jones 1997], and there are demonstrable productivity gains to be had from implementing directed graphs, callbacks, and the like without worrying about whether or not they are introducing circular data references. Until Perl and Python adopt this, I think their advocates should concede that Java is actually a higher-level language.
Second, I want to see a VHLL defined by an XML DTD. Doing this will allow me to put as much information into my program source as my niece can put into the email messages she composes using Netscape. It will also allow programmers to take direct advantage of the coming wave of XML manipulation tools to create class browsers, design recovery aids, and other source manipulation tools. Finally, if a program's source is defined using <method>, <parameter>, and <block> tags, then individual programmers can choose whatever superficial appearance they want. Three different programmers, for example, could view nesting using indentation (Python), curly braces (Perl), or parenthesized prefix notation (Scheme). I believe this would be as big an innovation in practical programming as applets were, and probably more useful.
Third, I would like VHLLs to start incorporating ideas that have emerged, and proved their worth, in the post-AWK era. Icon, Linda, Erlang, Haskell, Eiffel, and data-parallel languages like J and Fortran-90 (yes, Fortran) can all be plundered--err, used as sources of inspiration. Some specific suggestions include:
[Aho 1988] Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger: The AWK Programming Language. Addison-Wesley, 1988, 020107981X.
[Freeman 1999] Eric Freeman, Susan Hupfer, and Ken Arnold: JavaSpaces Principles, Patterns, and Practice. Addison-Wesley, 1999, 0201309556.
[USENIX 1994] Tom Christiansen et al (eds.): USENIX 1994 Very High Level Languages Symposium Proceedings. October 26-28, 1994, Santa Fe, New Mexico