advertisement

Article:
  Tapping RSS with Shell Scripts
Subject:   a few things to beware of
Date:   2004-03-13 10:12:14
From:   jnazario
a few things to beware of, for the slashdot RSS feed and for RSS parsing using regular expressions.


slashdot's got server load problems (they are quite popular, imagine a several year sustained slashdot effect), and one way they try and deal with it is by blocking people who snag their RSS feed more than once every 30 minutes. hence, if you use this login script and log in more than once every half hour (or if this is a system wide thing ...) you're toast. instead, use cron to fetch the RSS once an hour (IIRC they rebuild their RSS only hourly, like most sites) and use a local cache for this script. you'll ensure you get headlines.


secondly, parsing RSS using regular expressions is prone to errors if the feed changes. instead, look at a real XML parser. lightweight ones exist in perl and in python:


http://www-106.ibm.com/developerworks/web/library/w-rss.html


http://www-106.ibm.com/developerworks/webservices/library/ws-pyth11.html


these will be far more flexible and will work for any valid RSS/XML file.


hope this helps.