Have modern programming languages failed? From the point of view of learnability and maintainability, yes! What would a truly maintainable and learnable programming language look like? This is the third of a six-part series exploring the future of programming languages (read The World’s Most Maintainable Programming Language: Part 1, The World’s Most Maintainable Programming Language: Part 2, The World’s Most Maintainable Programming Language: Part 4, The World’s Most Maintainable Programming Language: Part 5, and The World’s Mode Maintainable Programing Language: Conclusion).

Simplicity

Simple things are easier to learn, so the language will optimize for simplicity, having as few commands as possible.

Because the goal of the language is to be as easy to learn as possible, it must use only a few primitives. Why? Compare decimal math with hexadecimal. People who haven’t already studied programming or higher mathematics find decimal much easier to use. It’s obvious why; it uses almost 40% fewer primitives, lacking A - F!

Consider also the endless homophonic confusion in natural language, where there are multiple possible valid consonant spellings for a single phoneme in differing contexts. Reducing a spoken language to a simple set of separate phonemic representations would undoubtedly make it easier to learn. Yet the ability to combine characters into words in a written language still allows expressibility and extensibility.

The same goes for a programming language.

An example may help. The Latin language is easy to learn (at least in comparison to modern languages) because each letter has a unique sound. (I do not ignore the difference between long and short vowel sounds, as properly written Latin uses the long bar notation to denote long vowels, removing the ambiguity.) If you can pronounce a word in Latin, you can spell it — and vice versa. This is a much better situation than even English, with confusing homophonic pairs and triplets including ghoti/fish, lead/lead, and deer/dear.

It’s possible to go too far in this direction. If you take Turing’s hypothetical universal computing machine and somehow manage to invent the infinite tape necessary to drive it, you only need four primitives. However, the semantic simplicity of such a system is too overwhelming. Perhaps eight to ten primitives is the right number. The goal of simplicity in this context is to create a system where it is impossible to write code containing a construct a newcomer to the language will not recognize.

It’s also much easier to read a one-page guide than a six-hundred page dictionary.

Language design should focus on removing redundant features, options, and choices, to consolidate an essential core of high-level operations into a small, easily learned, unambiguous feature set.

Comprehensiveness

A language with support for different platforms and paradigms and tools is better than a language without, so the standard distribution will include support for everything useful.

If the most important thing you can do with a language is to learn it, the second most important thing you can do is to solve problems with it. Problems come in varying shapes and sizes, so the language designer should only rule out classes of problems to solve if the solution gets in the way of learnability. For example, while it’s possible to support Unicode in an efficient and effective manner, it’s impossible to do so transparently, or at least in a way that makes sense to novice programmers. Thus no language that supports Unicode is truly maintainable.

It’s worthwhile to examine two separate approaches by two existing, imperfect languages to find the right approach.

The Java language has many flaws, mostly related to inconsistency and overcomplexity. Yet people use it primarily because it has a huge standard library. The comprehensiveness of its support overcomes the deficiencies of the core language.

The Haskell language has a very small, simple core based on a few mathematical properties that most people learn in school. Yet it languished in adoption until the most popular implementations adopted a standard mechanism for file IO. Some users who have tried the monadic system might rightly assert that the particulars of this implementation added perhaps too much complexity to a system that already had enough primitives. This only goes to show the tension between simplicity and comprehensiveness and why it’s important to address both while designing the language. Would the Haskell designers have chosen a better set of primitives if they had considered monads at the start? Undoubtedly!

Though the existing literature in programming language design and research often refers to this small set of primitives as a core calculus, the term misleads novices. Where the goal of calculus is to resolve Xeno’s paradox by the recursive application of ever-smaller straight-edged rulers, the goal of designing a programming language is to approximate perfection by the application of ever more perfect language constructs. There is an obvious similarity, but the word “calculus” implies an asymptotic limit for the payoff to effort ratio. Because digital computers are completely digital at heart, with no confusion about 1 and 0, it must thus be possible to ratchet a hierarchy of layered abstractions to likewise eliminate all confusion in successively higher-level languages.

One point remains unaddressed: the issue of the language extension mechanism. As mentioned earlier, Java suffers from overcomplexity despite its useful core libraries. One of the reasons for this is an artificial distinction between primitives and extensions. For example, while the language rightly eschews operator overloading in general, as it is difficult to explain and fiendishly difficult to implement in practice without generating homophonic confusion, it allows it for the String classes. Novices must understand that these classes are special and different from all other classes that exist or may exist.

The PHP mini-language, at least until version 4, solved this problem by making everything a function. This is the hallmark of simplicity — all extensions produce functions that appear indistinguishable from built-in functions. It is easy for a reader to understand that mysql_connect() does connect to a MySQL database, as does pg_connect() connect to a PostgreSQL database.

Other languages go too far in the opposite direction, layering too much abstraction. For example, in Perl’s non-core database access layer, DBI, connecting to a MySQL database and a PostgreSQL database use the same apparent code: DBI->connect( ... ). Though this appears to use the principle of consistency by reusing an apparent primitive, it actually suffers in that it does not clearly distinguish between different things. connect() is a false cognate because it is a transitive verb, always requiring a direct object.

Still, even an imperfect language with comprehensive library support has advantages over a perfect language with sparse library support. (Ignore for the moment the revised ontological argument which proves that the most perfect possible language must have good library support.)