The Power of mdfindby Andy Lester
Tiger introduced Spotlight, a powerful searching mechanism that indexes the contents of all the files on your Mac, almost like magic. Want to find all the documents that mention Perl? Spotlight will do it for you. Just press ⌘+space bar (Command-space bar), and the Spotlight search box pops up. Type in "Perl" and even before you hit the Return key, Spotlight is searching its magic indices for the results.
Note that the documents found are grouped by type, and even include things you might not consider as documents. In this case, the search for Perl on my iBook found mail messages and Keynote presentations, as you'd probably expect, but also Address Book and iCal entries. It's also reassuring that it found mdfind.html, the working version of this article. Sample results from searching my iBook for "perl" are shown below.
All the Spotlight-related functionality is based on the idea of metadata, or data about data itself. A Word document contains data, but the fact that the document was created on October 18th, 2001 is metadata. (Metadata is abbreviated "MD" throughout.) Keyword searching on contents of files, not just filenames, is the most obvious way to use Spotlight, but it can also search based on date ranges ("Where's that mail message about OSCON that I wrote last week?") and document types ("Didn't I have a PDF that had Perl testing shortcuts on it?"). The full Spotlight search pane, accessed by clicking the Show All option at the top of the mini list, gives all the details.
In addition to the little blue magnifying glass in the upper-right corner of your desktop, Tiger provides the
mdls commands. When I discovered them while working on my updates to Mac OS X Tiger In A Nutshell, I fell in love. I had the power of Spotlight available to me from the Unix shell. Experienced Unix users will find
mdfind's interface familiar. Mac power users who have never used the Unix under the hood of Tiger are in for a treat.
I've worked on a number of different books for different publishers. Files relating to those projects are scattered around my hard drive. Let's say I'm looking for a copy of an invoice I sent to Apress for my work on Pro Perl Debugging. The easiest way to start is to ask
mdfind to do a simple keyword search on the word "invoice."
mdfind invoice/Applications/Microsoft Office 2004/Templates/Business Forms/Invoices /Users/andy/Library/Mail/IMAP-andy@mail/KMSD/Messages/45070.emlx /Users/andy/Library/Mail/IMAP-andy@mail/KMSD/Messages/45071.emlx /Users/andy/Library/Mail/IMAP-andy@mail/KMSD/Messages/45068.emlx /Users/andy/Library/Mail/IMAP-andy@mail/KMSD/Messages/45069.emlx /Users/andy/Desktop/Tape backup.doc /Users/books/oracle-cd/webapp/ch09_05.htm ... 102 more filenames ...
What I get back is a list of 110 files that match the word "invoice" somewhere in their contents. The first hit is a directory of templates created by Office, followed by some mail messages about custom work being done for a customer in my day job. Then there's a document proposing a new tape backup server, and then a page from the (unfortunately discontinued) Oracle CD Bookshelf that I've copied to my local hard drive. Digging through 110 filenames, especially if I have to open them to see what's inside, would be tedious.
I'll narrow down the search by adding terms. Since I worked on the book for Apress, I'll add that as a keyword. All words specified in a search term are ANDed together. Since I'm passing multiple words, and I want the Unix shell to pass them as one argument to
mdfind, I need to put them in double quotes.
mdfind "invoice apress"/Users/alester/pro-perl-debugging/admin/TR.Invoice.Lester.Foley and McMahon.doc /Users/alester/pro-perl/admin/TR.Invoice.Lester.Wainwright.doc
Now we're down to two hits, and it's clear I want the first file.
All words passed in a query string to
mdfind are implicitly ANDed together. That is,
"invoice apress" means both words must appear. Spotlight allows other Boolean operators as well:
|, the pipe character, means Boolean OR
-means to exclude a term
Working with these operators can be tricky. Whitespace is significant when building queries. To get all documents with "invoice" or "o'reilly", I write
with no spaces between the terms. If I want to find all documents with "invoice" but not "apress", it's
with no intervening spaces, and parentheses around the term I want to exclude. To get a list of invoices or contracts from O'Reilly, I'd use
mdfind "(invoice|contract) o'reilly"
Note that in all these examples that have more than a single word, I'm using double quotes around the search term. This makes the Unix shell pass our multiple words as a single parameter. Otherwise,
mdfind uses only the last word, so that
mdfind invoice contract
is the same as
It also prevents the shell from intercepting characters that it would use as special, like the parentheses, and passes them unmolested to
mdfind. This is especially important if I try to search for "O'Reilly". Without quotes, I get this:
The angle bracket is the shell telling me "You started a quoted string, and now I'm waiting for you to finish it." It will sit and wait for input until it sees another single quote, or I type Ctrl-C to cancel. The shell has interpreted the single quote in "O'Reilly" as the start of a quoted string. Instead, I want