View Review Details


Book:   Spidering Hacks
Subject:   Spidering Hacks Review
Date:   2004-01-15 09:27:24
From:   Bill Day
Rating:  StarStarStarStarStar

Spidering Hacks


Authors: Kevein Hemenway & Tara Calishain


Publisher: O’Reilly & Associates


Price: $24.95


Pages: 402


Web site: <http://www.oreilly.com/catalog/spiderhks/>


Reviewed by Bill Day,


Grand Rapids (Michigan) PerlMongers


4.5 stars (5 star scale). This book is not perfect, the authors may have tried to cover too much material. The material is very time sensitive, hence the book needed to be rushed together, it will have little value in 5 years. I wanted to give the book a higher rating, I tried to think of a better way to present the material in 400 pages and couldn’t. There are just too many rough edges for a 5 star book.


As a member of O’Reilly’s “Hacks” series, “Spidering Hacks” is different than the typical O’Reilly book. This book presents breadth of topic rather than depth. The format is 100 hacks (mostly Perl on Linux with an odd Python, Java, or Windows hack), some written by Hemenway & Calishain, many written by guest authors organized into 6 chapters. The number of authors leads to a variety of styles in both English and Perl. If you treat the book as a super magazine (time sensitive short articles), you won’t be disappointed.


Chapter 1 – Walking Softly (Hacks 1-7)


Chapter 1 provides general guidelines on spider/scraper etiquette and good practices, which the rest of the book seems to ignore.


Chapter 2 – Assembling a toolkit (Hacks 8-32)


An overview of several modules and techniques with working examples. More experienced Perl mongers may find this material remedial.


Chapter 3 – Collecting media files (Hacks 33-42)


The hacks on POP3 attachments and Usenet may be worth the price of the book for those trying to solve a particular problem.


Chapter 4 – Gleaning data from databases (Hacks 43-89)


Over ½ the book is dedicated to this chapter. Initially it appears that these are very specific solutions for a narrow audience. Closer reading reveals a variety of techniques that can be used in many circumstances.


Chapter 5 – Maintaining your collections (Hacks 90-93)


Not much here. Cron is covered much better in other works.


Chapter 6 – Giving back to the world (Hacks 94-100)


Essentially how to be nice to spiders. Why Net::AIM is covered here seems arbitrary. Hack #100 “Going beyond the book” is nothing but fluff.


An example of how I used the book may be illustrative. I wanted to scrape TV listings, but hack #73 “Scraping TV Listings” has been made obsolete by a modification to tvguide.com. I was able to quickly use the toolkit presented in chapter 2 to scrape one of the many other web sites with TV listings. I expect this to be typical, sites change, spiders and scrapers need to adapt.


Spider Hacks is an odd collection of articles that seem to cover the remedial to intermediate skill ranges. Nobody will benefit from all 100 hacks, but most of us will find $24.95 of value in the hacks that cause us to go “How cool!”.