General purpose programming languages can be divided into two broad categories: Haskell and everything else.
In Part 1 of this series, I described how all programming languages are grouped into four families: imperative, object-oriented, logic, and functional. Most programming languages start with an imperative core and add object-oriented or functional features on top. C, for example, is a purely imperative programming language. Languages like C++, Java, and C# build on C and wrap up this imperative model with objects. Perl, Python, and Ruby add a little bit of functional programming into the mix as well.
Haskell is different because it is a purely functional language. That is, it supports only the functional style of programming. This unconventional approach gives Haskell a reputation of being a difficult language to learn and use, especially among professional programmers who casually jump around between languages like Java, Perl, Python, PHP, and Ruby. Most languages focus on what features to add and blend to make developers more productive. Haskell takes the opposite approach, removing needless features until there is nothing left to take away.
Functions are the heart of Haskell. However, functions in Haskell aren't like functions in other languages. Haskell functions are pure, meaning that they define a relationship between inputs and outputs, and have no side effects. This behavior mimics the notion of function you learned in algebra many years ago. Operations like square root, sine, and cosine are pure functions; the square root of four will always be two, regardless of how, where, or when it is computed.
Impure functions refer to external state, modify their behavior based on external state, or change external state. C functions like
malloc() are examples. Calling
malloc() returns a pointer to a different block of memory each and every time it is called, but when you attempt to allocate more than the operating system can provide,
malloc() will fail. Similarly, if you attempt to allocate a large block of memory,
malloc() may succeed, but cause the operating system to swap considerably, degrading performance for the entire machine. Clearly, impure functions are necessary, but unpredictable, because their behavior varies based on how, when, and where they are invoked.
Both kinds of functions are necessary to model the real world. Converting temperatures from Fahrenheit to Celsius is a pure function; by definition, 212° F should always convert to 100° C. Converting U.S. dollars to U.K. pounds is an impure function that is determined by a very large number of external variables, and there is no expectation that performing the same transaction at two different times or two different places should produce the same result.
Of course, to be a truly general purpose programming language, Haskell needs to support both pure and impure functions. Haskell makes impure functions available, but isolates them through the use of monads, which are actually built from pure functions. Monads are a deep and interesting topic, and are discussed in the next part of this series.
Because pure functions refer only to their inputs, they are deterministic, easy to understand, and easy to predict. For example, a function like:
square x = x * x
will always receive a single argument, and return the value of multiplying that argument by itself. Therefore, an expression like
square 2 always returns
4. A function that computes
2 * 2 and writes a message to the console or a logfile is an impure function whose behavior cannot be predicted; the computation may succeed, but the function may fail, due to the logging operation. Logging may succeed most of the time, but there is no way to predict whether this function will attempt to write to a closed file handle, write to a file on a filesystem that is no longer available, write to a full filesystem, or any of a number of other I/O failures.
The key to understanding pure functions is to think in terms of small, composable units of code. Simple operations like addition, multiplication, sine, cosine, and concatenation are all pure functions. Pure functions can be composed from other pure functions, like these here:
f x = (3 * x * x) + (6 * x) + 9
Many functions depend on more than one value, and accept multiple arguments:
min a b = if a < b then a else b
Note that Haskell can use indentation to signal when an expression is defined over multiple lines. This is known as the off-side rule, and is similar to Python's rules for block scoping. Unlike Python, Haskell's off-side rule is merely a convention for inferring where to place curly braces and semicolons around a series of expressions and statements. Either form is valid, but using the off-side rule is the more convenient and more common form. (The Haskell 98 Report defines the precise interpretation of the layout rule.)
Some functions are complex and refer to more involved sub-expressions. Here is a function that calculates the cost of a shopping basket, where some items are taxable, and others are not:
taxRate = 0.06 total cart = subtotal + tax where subtotal = sum cart taxable = filter isTaxable cart tax = (sum taxable) * taxRate
This example defines two functions,
taxRate, which returns a constant value, and
total, which computes the total cost of the list of items in a shopping cart. (Although the
taxRate definition appears to be defining a variable, it's best to think of it as a constant function, a function that takes no parameters and always returns the same value.) The definition of
total is quite expressive, and highlights the intent of the function, by isolating and naming important sub-expressions in the computation. (
total also refers to an
isTaxable function, not presented here.)
where clause in
total defines meaningful sub-expressions that are used to evaluate the primary expression,
subtotal + tax. The
subtotal, as you would expect, is just the sum of all the items in the cart. The definition of
tax refers to
taxable, which is the subset of all items in the cart that are taxable (more precisely, the set of items in the cart where
isTaxable is true for that item). Computing the tax involves computing the sum of the taxable subset, and multiplying it by the tax rate. The two utility functions
filter do what their names imply and are predefined in the Prelude, Haskell's standard library of useful functions.
We could also define
total by avoiding the sub-expressions in the
where clause, and writing the entire computation out in a fully expanded form. This version is just as correct, but much harder for people to understand:
total cart = (sum cart) + (sum (filter isTaxable cart)) * taxRate
The compiler doesn't care how you write these functions, because they are equivalent expressions of the same idea. In fact, the compiler is allowed to rewrite the first definition into the second. However, using tools like
where clauses and libraries of useful and reusable functions can help make functions concise and expressive, and easier for people to read.
Haskell has deep roots in the world of mathematics, and it is easy to see how it can be used to define simple mathematical functions. However, real-world problems have very little in common with simple mathematical functions.
Or do they?
Consider a program that needs to find the 10 most-used words in a document. Assume for the time being that the document is available as a string value. (Reading a file from disk or standard input would require using the I/O monad, which will be discussed in the next article in this series.)
Fortunately, there are a variety of pre-defined functions in the Prelude that can make this job simple. The first step is to convert a document into a list of words. This is precisely the operation performed by the
words function in the Prelude:
top10 doc = .... where listOfWords = words doc ...
Next, the list of words needs to be normalized so that variations in capitalization don't sway the results. One way to do this is to convert every word to lowercase. However, there is no
lowercase that converts every uppercase character in a string to its lowercase equivalent. But there is a
toLower function in the
Data.Char library that will lowercase one character. Using
map we can apply this transformation to each character in a string:
import Data.Char lowercase str = map toLower str
Or, more consisely:
import Data.Char lowercase = map toLower
The Haskell compiler can infer that
lowercase must take a string (a list of characters) as an argument, because it knows that
map takes two arguments (a function and a list), and
toLower converts one character into another. Because
map toLower needs one argument,
lowercase must need the same argument.
Updating the definition of
top10 is easy. Instead of splitting the document into a list of words, we can split the lowercased version of the document into a list of lowercased words:
import Data.Char lowercase = map toLower top10 doc = .... where listOfWords = words (lowercase doc) ...
Next, the words need to be sorted and grouped. These operations can be performed by the
group functions in the
Data.List library. The
sort function does what you would expect. The
group function takes a list of values and returns a list of nested lists, where each nested list contains adjoining equal values. For example:
$ ghci Prelude> :module + Data.List Prelude Data.List> group [1,3,3,1] [,[3,3],] Prelude Data.List> group (sort [1,3,3,1]) [[1,1],[3,3]]
Adding this feature to
top10 is straightforward:
import Data.List top10 doc = .... where listOfWords = words (lowercase doc) wordGroups = group (sort listOfWords) ...
Ordering words by frequency involves a little manipulation of the list of word groups. Each word group is a list of one or more instances of a single word. An easy way to order by frequency is to convert these lists into a pair of values, where the first element is the word count (frequency) and the second element is the word itself. A simple function can perform this action on a single list of words:
makePair l = (length l, head l)
Converting a list of word groups is as simple as using
map to apply this function over the entire list:
-- convert [["a", "a"], ["b"], ...] -- into [(2, "a"), (1, "b"), ...] map makePair wordGroups
makePair is deeply tied to the problem we are trying to solve and isn't very generic. Instead of creating a name for this throwaway function, we can use a lambda expression to define this function in-place:
-- same as above map (\l -> (length l, head l)) wordGroups
\ character starts a lambda, or an anonymous function. Named parameters are found to the left of the arrow and the body of the function is found to the right of the arrow.
Generating the list of words and their frequencies is only part of the solution. Next, the words need to be sorted in descending order:
reverse (sort (map (\l -> (length l, head l)) wordGroups))
Unfortunately, the nesting parentheses complicate the structure of this expression. We can use the
$ operator to clean up the expression like so:
reverse $ sort $ map (\l -> (length l, head l)) wordGroups
$ operator is a little bit of syntactic sugar. It takes everything to the right and treats it as a single expression, then uses the right hand side as the last parameter to the expression on the left hand side. Therefore, these two forms are actually seen by the compiler as the very same expression.
top10 function now looks like this:
top10 doc = .... where listOfWords = words (lowercase doc) wordGroups = group (sort listOfWords) frequencies = reverse $ sort $ map (\l -> (length l, head l)) wordGroups ...
Next, the list of
(frequency, word) pairs need to be reduced to a list of words. Pairs are such a useful data structure in Haskell programs that the Prelude includes two functions to get at their contents:
fst returns the first element in a pair, and
snd returns the second element in a pair. To get the words out of these pairs, we need to use
snd. To convert the list of pairs into a list of words, we just need to use
top10 doc = .... where listOfWords = words (lowercase doc) wordGroups = group (sort listOfWords) wordPairs = reverse $ sort $ map (\l -> (length l, head l)) wordGroups wordsByFreq = map snd wordPairs
Finally, the goal of the
top10 function is to find the ten most frequent words in a document. Since
wordsByFreq is ordered from most frequent to least frequent, all we need to do is find the first ten elements in the list. This behavior is captured in the
take function in the Prelude.
take has two parameters: a desired size and a list of values; it returns a list no larger than the desired size.
With this one last piece of the puzzle, here is the complete definition of the
import Data.Char import Data.List lowercase = map toLower top10 doc = take 10 wordsByFreq where listOfWords = words (lowercase doc) wordGroups = group (sort listOfWords) wordPairs = reverse $ sort $ map (\l -> (length l, head l)) wordGroups wordsByFreq = map snd wordPairs
Loading the full text of RFC 822 into a value named
rfc822, and running that string through
top10 produces this result:
$ ghci top10.hs ___ ___ _ / _ \ /\ /\/ __(_) / /_\// /_/ / / | | GHC Interactive, version 6.6, for Haskell 98. / /_\\/ __ / /___| | http://www.haskell.org/ghc/ \____/\/ /_/\____/|_| Type :? for help. Loading package base ... linking ... done. [1 of 1] Compiling Main ( top10.hs, interpreted ) Ok, modules loaded: Main. *Main> rfc822 <- readFile "rfc0822.txt" *Main> top10 rfc822 ["the","of","to","a","is","and","be",";","for","in"] *Main>
Using this bottom-up approach to defining a function like
top10, we could write a series of similar functions. For example, a version that strips leading and trailing punctuation characters on each word. Or a version that builds a unique list of words, for use inside a spell checker. Or a version that returns both words and frequencies. Or a version that ignores stopwords like the, of and to in the source text. Writing a series of these functions will likely identify common components, like a function that converts a list of words to a list of (
frequency, word) pairs, or a function that converts a list of pairs into a list of words.
The list goes on, and I encourage you to play with this simple little function to explore Haskell further.
Many experienced programmers feel a kind of culture shock when first approaching Haskell. How can you program without mutable variables? Where are the objects? The class hierarchies? The design patterns? How do you debug a program without peppering print statements about? If pure functions can model simple mathematical concepts, can they also model real programming problems?
In this article, I've demonstrated how to develop real code using only pure functions. This approach is both powerful and different from what most programmers use today. Haskell encourages you to think in terms of small, powerful functions that do one thing and do it well. To help you in your efforts, the standard libraries provide a wealth of functions to study, reuse, and combine into interesting and novel functions that let you focus on the problems you are trying to solve, not on how to make your problem conform to an API or design pattern.
Functional programming in Haskell is a very rich, very deep topic that can't be fully explained in a single article. Visit the Haskell wiki to discover some of the more interesting topics, like partial application, higher order functions, function composition, and currying.
Adam Turoff is a software developer with more than a dozen years of experience in building web backend systems with open source tools. His current project uses a mixture of Perl, Ruby, Rails, Haskell, and other tools.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.