News -- Using Perl to Read Mail
by David N. Blank-Edelman07/01/2000
It's been said that if you work on any program long enough, that program will eventually be able to send electronic mail. It doesn't matter what the original purpose of the program was (if you can still remember)--if you develop it long enough, some day that program will send its first piece of email.
From the vantage point of a systems or network administrator, this means there are lots and lots of programs out there generating mail daily. Mail filters like procmail can help us with this deluge by sorting through the mail stream. But sometimes it is more effective to write sophisticated programs to actually read the mail for us. For example, we might write a program to analyze unsolicted commercial email (spam) or one that keeps long-term statistics based on daily diagnostic email from a server.
To begin the process of writing these programs, you first have to know how to take apart an email message or mailbox. This excerpt from my upcoming O'Reilly book, Perl for System Administration, describes this process.
Let's start with the basics and look at the tools available for the dissection of both a single mail message and an entire mailbox. For the first topic, we will turn to Graham Barr's MailTools package to use the Mail::Internet and Mail::Header modules.
Dissecting a Single Message
The Mail::Internet and Mail::Header modules offer a convenient way to slice and dice the headers of an RFC822-compliant mail message. RFC822 dictates the format of a mail message, including the names of the acceptable header lines and their formats.To use Mail::Internet, you first feed it an open filehandle to a message file or a reference to an array that already holds the lines of a message:
open(MESSAGE,"$messagefile") oruse Mail::Internet;
$messagefile = "mail";
die "Unable to open $messagefile:$!\n";
$message = new Mail::Internet \*MESSAGE;
close(MESSAGE);
If we want to parse a message arriving in a stream of input (i.e., piped to us on our standard input), we could do this:
use Mail::Internet;
$message = new Mail::Internet \*STDIN;
Mail::Internet hands us back a message object instance. We'll commonly use two methods with this instance: body() and head(). body() just returns a reference to an anonymous array that contains all the lines of the body of the message. head() is more interesting and offers a nice segue to the Mail::Header module.
Interested in learning more about Perl and other Open Source technologies? The O'Reilly Open Source Convention is the place to go.
Mail::Header is implicitly loaded whenever we load Mail::Internet. If we call Mail::Internet's head() method, it returns a Mail::Header header object instance. This is the same object instance we would get if we changed our first Mail::Internet example code to use Mail::Header explicitly:
use Mail::Header;
The $header object holds the
headers of that message and offers us several handy methods to get at this
data. For instance, to print a sorted list of the header names (which the
module calls "tags") appearing in the message, we could add this to the end
of the previous code:
$messagefile = "mail";
open(MESSAGE,"$messagefile") or
die "Unable to open $messagefile:$!\n";
$header = new Mail::Header \*MESSAGE;
close(MESSAGE);print join("\n",sort $header->tags);
Depending on the message, we'd see something like this:
Cc
If we need to retrieve all of the
Received: headers from a message, here's how we'd do it:
Date
From
Message-Id
Organization
Received
Reply-To
Sender
Subject
To@received = $header->get("Received");
Often we use the Mail::Header
methods in conjunction with a
Mail::Internet object. If we were using Mail::Internet to return an object that contained both
the body and the headers of a message, we might chain some of the methods
from both modules together like this:
@received = $message->head->get("Received");
Note that we're calling get()
in a list context. In a
scalar context,
it will return the first occurrence of that tag unless you provide it
with an occurrence number as an optional second argument. For
instance, get("Received",2)
will return the second Received:
line in the message. There are a number of other methods provided by
Mail::Header to add and delete
tags in a header. See the module documentation for more information.
Dissecting a Whole Mailbox
Taking this subject to the next level, where we slice and dice entire mailboxes, is straightforward. If our mail is stored in "classical Unix mbox" format or qmail (another Message Transfer Agent (MTA) à la sendmail) format, we can use Mail:: Folder by Kevin Johnson. Many common non-Unix mail agents like Eudora store their mail in classical Unix mbox format as well, so this module can be useful on multiple platforms.
The drill is very similar to the examples we've seen before:
use Mail::Folder::Mbox; # for classic Unix mbox format
$folder = new Mail::Folder('mbox',"filename");
The new() constructor takes the
mailbox format type and the filename to parse. It returns a folder object
instance through which we can query, add, remove, and modify messages.
To retrieve the sixth message in this folder:
$message = $folder->get_message(6);
$message now contains a
Mail::Internet object instance.
With this object instance you can use all of the methods we just discussed.
If you need just the header of the same message:
$header = $folder->get_header(6);
No surprises here; a reference to a
Mail::Header object instance is returned. See the Mail::Folder documentation for the other available
methods.
So now that you know how to slice and dice mailboxes and mail folders, what can you do with this newfound skill? In Perl for System Administration, we continue the discussion of dealing with email from Perl with two extended examples that are too long to print here: unsolicited commercial email (spam) analysis and support email augmentation. But these are just two possible places this skill can take you. I suspect you've already thought of several ways Perl programs that read your mail for you could be useful. Enjoy.
David N. Blank-Edelman is the Director of Technology at the Northeastern University College of Computer Science. He has spent the last 14 years of his life as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He has served as Senior Technical Editor for the Perl Journal and has written many magazine articles on world music. In his spare time, he studies mbira, a traditional Shona instrument from Zimbabwe.
O'Reilly & Associates will release Perl for System Administration in July, 2000.
-
Sample Chapter 9,
Log Files, is available free online.
-
You can also look at the
Table of Contents, the
Index, and the
Full Description of the book.
- For more information, or to order the book, click here.









