I recently needed to filter and process some Atom feeds. I know enough XML that I could process them with my own SAX filter, but this seemed like a better opportunity to use the XML::Atom module. Fortunately, it was very easy.
Installation
Installation went well from the CPAN shell. It required a few other modules I didn’t have installed, but this was on a new machine and they all seemed pretty standard. I also didn’t have libxml2 installed, but that’s also a common dependency.
Usage
XML::Atom provides several objects. The important ones for
my purposes represent feeds and entries. When I publish an Atom feed, it’s
a list of entries. When you subscribe to my feed, you get a chronological
list of entries.
My project needed to work through a list of feeds to find new entries to
process. XML::Atom’s documentation pointed me in the right
direction, though I had to make a few guesses.
Working with Feeds
Suppose I have a feed. Suppose it’s just an XML file on my local hard
drive. I need an XML::Atom::Feed object:
use XML::Atom::Feed;
my $feed = XML::Atom::Feed->new( 'sample.xml' );
That worked. A feed has several attributes, including a title. The
Feed object provides accessors:
So far, so good. What can I do with it? Let me check the title:
warn $feed->title();
Working with Entries
More importantly, a feed contains entries. My goal was to process those somehow. The code is again simple:
for my $entry ($feed->entries())
{
}
What’s in $entry? XML::Atom::Entry objects.
This is where the documentation started to get a little sketchy, but the
code is straightforward and sensible. A little guessing worked out
fine.
My application must process each entry once. A feed may get refreshed
once a day, but newer versions might include already-processed entries.
Fortunately, the Atom specification includes a unique identifier for each
entry. It’s the responsibility of the feed creator to provide these. It’s
easy to fetch them from the Entry objects, though I suspect
that I’ll hash them just for a little extra paranoia:
my $id = $entry->id();
My application also needs the entry titles:
my $title = $entry->title();
The really important part of my application uses the content of the
entry. That’s the main text. This is where the documentation was unhelpful
and I had to read the source. It turns out that there’s a
content() method:
my $content = $entry->content();
That didn’t give me text; it gave me an object. I wanted the text:
my $body = $content->body();
That’s all of the pieces I needed to build my application. It’s only a handful of method calls; I’m pleased;
Enhancements
It’s a little bit unrealistic to expect that I’ll only ever parse local feeds. It’s useful to do so when developing so as not to punish someone’s web site with any of my programming errors, but it would be nice to be able to parse live URLs. How does that work? Here’s what I wanted to write:
my $feed = XML::Atom::Feed->new('http://example.com/some_feed.xml' );
That actually worked. The documentation made me think that it required a
URI object, but the version I tested (0.25) handled this case
nicely.
I put off doing this project for a while because I’d never consumed Atom data before. It’s surprising how easy it was.

Nice article!!!!
I love examples like this, I may not need it now but I will bookmark it and revisit it when I do. It really helps to have a little code to work from when hacking in perl. :)
Thanks chromatic.
Even better is XML::Feed, which does most of this in an abstract way for both Atom and RSS feeds. These related modules definately make life a lot easier.
this is an example of perl/xml crap ya know???
I thinks it is cool from you article, but only a few site works in my test code.
oh, yeah, xml-feed is great!