One of the ideas that came up on the Ruby web page redesign list was a rotating set of application and library spotlights. The idea didn’t take root, but Martin DeMello produced this spotlight on glark, which I’m posting here with his permission.
I’ve often joked that the ruby community seems to produce far more libraries than it does actual applications. One of my favourite exceptions is glark, a command line utility that does everything you’ve always wished grep did, and some things you might not even have thought to wish for.
The glark project page introduces it as
A replacement for (or supplement to) the grep family, glark offers: Perl compatible regular expressions, highlighting of matches, context around matches, complex expressions (“and'’ and “or'’), and automatic exclusion of non-text files.
Even with just the first of these features, glark would have been invaluable — indeed, the majority of the time, I use it as nothing more than a PCRE-enabled grep. However, that’s definitely not all there is to it - glark has a plethora of features that I might not use every day, but which are extremely handy when I do need them. Here’s a quick look at some of the more useful ones.
context and highlighting
By default, glark highlights matching strings using ANSI escape sequences. There’s a grep-compatibility mode (–grep) that turns this off. Another useful option is match context — the -A n and -B n flags print n lines after and before the match respectively. There is also the option to print the entire text of matching files, with the matched portions highlighted, and the –extract-matches flag, which pulls out only the matching portion of each line.
glark provides four operators to help build up complicated patterns: –or, –xor, –and=n (where –and=n a b means match a within n lines of b) and ! (which inverts a regex). For example, this finds all occurrences of ‘print’ or ‘puts’ within two lines of ‘if debug’ or ‘if DEBUG’:
glark –and=2 ‘/if debug/i’ –or print puts *.rb
and this matches all lines that have an octal number and aren’t commented out:
glark –and=0 ‘!/^#/’ ‘\b0[0-7]+\b’ *.rb
Another very welcome feature is the inclusion of several features from find. Both the basename and the filename can be matched against regular expressions (again, full-fledged PCREs, not globs), and the soon-to-be-released version 1.7.10 will have full support for set differences, for example
glark –with-filename ‘\.c$’ –without-filename ‘^test-’ foo .
will search all .c files except those starting with ‘test-’.
Binary files are automatically skipped by checking a sample of the file for an excessive number of non-ASCII characters; glark also lets you specify a list of known-text and known-non-text extensions for which it will skip the check.
Other neat touches are the ability to exclude files whose names match the string being searched for - this is useful, for example, for finding all external references to a file - and the option to skip files above a certain size.
If you set local-config-files: true in your ~/.glarkrc, glark will search upwards from the current directory for the first .glarkrc file it finds, and use it to modify the ~/.glarkrc. This is particularly useful for changing the definition of a binary file — the author gives the example of .class files, which are binary in a Java project, but text in a PHP one.
I asked Jeff about his future plans for glark, and he was kind enough to
respond at some length:
Well, I’m realizing that I really should write a tutorial/overview of glark, since the man page doesn’t adequately cover its features. It’s also expanded into the realm of find(1), and I’d like to further develop that, perhaps by integrating other features from find into glark. I’m not sure that glark should try to be grep+find, but it’s certainly nice to have regular expressions and not just globs for file matching, so I can appreciated the desire for that. I’d also like to convert globs into regular expressions automatically, so if someone writes “–fullname=*.rb” it will get converted properly into a regexp, yet will also provide an interface consistent with that of find.
The main thing that I’m doing with glark is migrating away from an application to a library. What I hope to do with that is to make it more scriptable, so that small programs can be written against it, and thus the .glarkrc files won’t be so complicated. This would make possible something that I really want in Java code, akin to “search for this string, find the variable associated with it, and then tell me where that variable is used”. I run that quite often — for searching message strings — and it would be nice to automate it instead of doing it as several clumsy steps. That’s the type of behavior that exceeds the limits of a data file, and by reimplementing glark as a library, it puts Ruby first, and the glark library more in the background. That is, it would be possible to *program* with glark, not just run it.
The latest version of glark is always available from the project page
If you use Gentoo, there’s an ebuild, but it usually lags a version.