I’ve been reading Beautiful Code, picking out chapters here and there as I have time. While reading Brian Kernighan’s explanation of Rob Pike’s regular expression program from The Practice of Programming, I had an idle thought. “Hey, that’s a highly recursive program with complex behavior suitable for didactic purposes.”
Of course, Kernighan says that almost verbatim in the text. He also says “It’s a nicer example than Yet Another Fibonacci Sequence Generator.”
So I ported it to Haskell. I don’t promise it’s necessarily great Haskell, and I wouldn’t consider it entirely beautiful, but it appears to function.
match :: String -> String -> Bool
match ('^':xs) y = matchhere xs y
match x y = matchhere x y
matchhere :: String -> String -> Bool
matchhere [] _ = True
matchhere _ [] = False
matchhere (x:'*':xs) y = matchstar x xs y
matchhere ('$':xs) y = y == []
matchhere ('.':xs) (y:ys) = matchhere xs ys
matchhere (x:xs) (y:ys) = if x == y then matchhere xs ys else False
matchstar :: Char -> String -> String -> Bool
matchstar c [] (y:ys) = c == y
matchstar c x (y:ys) | c == y = matchstar c x ys
| c == '.' = matchstar c x ys
| otherwise = matchhere x (y:ys)
matchstar c x _ = c == '.'
The C version has 24 significant lines of code, by my count. My Haskell version has 13. I’m sure a good Haskell hacker could simplify as well; the guard clauses in matchstar don’t thrill me.
My test code is so ugly that I almost hate to share it, but after my second time of typing :l regex.hs into GHCI, I decided I needed a driver. Avert your eyes:
main :: IO ()
main = do {
checkMatch "foo" "foo";
checkMatch "foo" "bar";
checkMatch "" "foo";
checkMatch "." "Foo";
checkMatch ".*" "Foo";
checkMatch "F." "Foo";
checkMatch "F." "foo";
checkMatch "F.*" "Foo";
checkMatch "Fo*" "Foo";
checkMatch "Fo*d" "Foo";
checkMatch "Fo*d" "Foooood";
}
checkMatch :: String -> String -> IO ()
checkMatch pat str =
if match pat str then
putStrLn ( "/" ++ pat ++ "/ =~ \"" ++ str ++ "\"" )
else
putStrLn ( "/" ++ pat ++ "/ !~ \"" ++ str ++ "\"" )
Overall, I’m reasonably pleased with the translation. I wish I had a better way to reduce some the number of variants of matchhere, but the translation was reasonably straightforward.
I’m sure a really Haskellish version could be shorter still.

Little bug: by putting the empty list match before the '$' match, "Foo$" doesn't match "Foo".
It appears the code has some bugs. For instance,
1) match "oo" "foo" returns False
2) match "foo$" "foo" returns False
3) match ".*" "foo" returns False
4) match "a*a" "aaa" returns False
Another complaint I would have is that the names "text" and "regexp" make the original C code much easier to understand. When reading the Haskell code I often got confused between what the x's and y's were supposed to represent.
-Andrew
Tweaked version with bug fixes and more tests:
http://hpaste.org/2369
It looks like this test fails:
checkMatch ".*" "Foo";
it evaluates as
match ".*" "Foo"
matchhere ('.':'*':[]) "Foo"
matchstar '.' [] ('F':"oo")
'.'=='F'
False
This is because you did not use guards on the first definition of 'matchstar'. Here is what may be a corrected version:
matchstar :: Char -> String -> String -> Bool
matchstar '.' xs (_:ys) = matchstar '.' xs ys
matchstar c xs (y:ys) | c==y = matchstar c xs ys
| otherwise = matchhere xs (y:ys)
"Hey, that's a highly recursive program with complex behavior suitable for didactic purposes."
Do you really have thoughts like that? : )
Updated version of ptolomys that correctly (I think?) handles ^.
http://hpaste.org/2375
chromatic, also see http://www.nondot.org/~sabre/Projects/HaskellLexer/index.html
Hi,
I am reading the same book. I noticed that match ("a*bcd", "xyzbcd")=1, is it acceptable?
From my understanding to regular expression, it's not correct.
@Ruiyu,
Your example looks correct to me. The star metacharacter matches zero or more occurrences of the previous character, and there are indeed zero
acharacters in the string to match.