I’ve been reading Beautiful Code, picking out chapters here and there as I have time. While reading Brian Kernighan’s explanation of Rob Pike’s regular expression program from The Practice of Programming, I had an idle thought. “Hey, that’s a highly recursive program with complex behavior suitable for didactic purposes.”

Of course, Kernighan says that almost verbatim in the text. He also says “It’s a nicer example than Yet Another Fibonacci Sequence Generator.”

So I ported it to Haskell. I don’t promise it’s necessarily great Haskell, and I wouldn’t consider it entirely beautiful, but it appears to function.

match :: String -> String -> Bool
match ('^':xs) y = matchhere xs y
match x        y = matchhere x y

matchhere :: String -> String -> Bool
matchhere []         _      = True
matchhere _          []     = False
matchhere (x:'*':xs) y      = matchstar x xs y
matchhere ('$':xs)   y      = y == []
matchhere ('.':xs)   (y:ys) = matchhere xs ys
matchhere (x:xs)     (y:ys) = if x == y then matchhere xs ys else False

matchstar :: Char -> String -> String -> Bool
matchstar c  [] (y:ys) = c == y
matchstar c   x (y:ys) | c == y    = matchstar c x ys
                       | c == '.'  = matchstar c x ys
                       | otherwise = matchhere x (y:ys)
matchstar c   x _      = c == '.'

The C version has 24 significant lines of code, by my count. My Haskell version has 13. I’m sure a good Haskell hacker could simplify as well; the guard clauses in matchstar don’t thrill me.

My test code is so ugly that I almost hate to share it, but after my second time of typing :l regex.hs into GHCI, I decided I needed a driver. Avert your eyes:

main :: IO ()
main = do {
    checkMatch "foo"  "foo";
    checkMatch "foo"  "bar";
    checkMatch ""     "foo";
    checkMatch "."    "Foo";
    checkMatch ".*"   "Foo";
    checkMatch "F."   "Foo";
    checkMatch "F."   "foo";
    checkMatch "F.*"  "Foo";
    checkMatch "Fo*"  "Foo";
    checkMatch "Fo*d" "Foo";
    checkMatch "Fo*d" "Foooood";
}

checkMatch :: String -> String -> IO ()
checkMatch pat str =
    if match pat str then
        putStrLn ( "/" ++ pat ++ "/ =~ \"" ++ str ++ "\"" )
    else
        putStrLn ( "/" ++ pat ++ "/ !~ \"" ++ str ++ "\"" )

Overall, I’m reasonably pleased with the translation. I wish I had a better way to reduce some the number of variants of matchhere, but the translation was reasonably straightforward.

I’m sure a really Haskellish version could be shorter still.