Hacking Books with Safari Web Servicesby Paul Bausch
My name is Paul, and I'm an information junkie. Like a lot of developers, I've used my coding skills to help feed my information habit. I have scripts scraping web sites, reading RSS files, writing web pages, and sending me emails and instant messages with the latest news from over a hundred different sites. I feel like this constant flow of data keeps me up to date with the Web.
Though it sometimes feels like it, I periodically need to remind myself that not every bit of information is available on the Web. I sometimes forget that those rectangular objects housing printed, pressed wood pulp on my bookshelf contain some valuable information, too. The problem is, I can't screen-scrape a book, so it's impossible to work the data trapped in that legacy format into my personalized flow of information. And this leads me to wonder, what critical information am I missing?
The Safari API is helping me answer that question, and I'll demonstrate what I mean with a quick sample application. But first, what are Safari web services?
Safari Web Services
Safari Books Online is a joint venture between O'Reilly Media Inc., and The Pearson Technology Group. For the past three years, they've been making offline technical books available on the Web. This means developers can search through the entire contents of thousands of technical books just like anything else on the Web, looking for the code fragment or explanation that solves a task at hand.
This July, Safari took things a step further by releasing their web services API, giving developers programmatic access to their data. This means that not only can developers search through technical books and read the contents through a web browser--they can route that Safari data anywhere they need it.
The Safari API has a REST interface, with request URLs returning XML over HTTP. The response XML is organized by
<book> elements with child elements containing all of the data about that particular book.
Show Me the Data
Exactly what data is available via the Safari API? There are three general areas of information:
- Book Catalog Info: Anything you can find out by picking up the book at a bookstore: title, author, publisher, etc.
- Book Structure Info: Information about the book's contents: chapter titles, section titles, and the order they in which appear within the book.
- Book Contents: The API provides access to a limited amount of actual text from the book: section extracts that give context to search results, and section previews (the first paragraph or so from a chapter or section).
The first area of data, bibliographic/catalog information, has been available to developers for many books through Amazon's web services. But opening the book cover programmatically hasn't been possible before. With this in mind, let's take a look at a quick, practical application that makes use of some book content from the Safari API.
RSS Example Application
RSS is an XML format for web publishers that standardizes how site content is published and consumed. If a site has an RSS feed available, I can quickly add it to my list of watched feeds, and I'm notified whenever there is new content available. Safari doesn't provide access to their data as RSS, but with their new API it's fairly simple to transform the API responses into RSS so I can incorporate their data into my list of watched sites.
Editor's Note: Since this article was written Safari has upgraded its RSS capabilities, and you may now subscribe to Safari RSS feeds for the new and most popular titles. See the Safari Tools page for more information.
For example, imagine you're interested in RSS and you'd like to stay up to date with what the offline technical book world is publishing on the subject. Because the Safari API lets you query to find any book sections that mention the phrase "RSS," you can quickly build an RSS feed with this information--ordered by book publication date. This means that any new technical book published through Safari that mentions the phrase "RSS" will appear in your newsreader, along with a brief extract from that section of the book.
Here's how you can put this custom RSS feed together with Perl. If you use this script, you'll need a couple of non-standard Perl modules to make the API request and transform the results: LWP::Simple and XML::XSLT. You'll also need a Safari developer token, which you can pick up for free at the Safari Affiliate site. Create the first file, called safari_rss.pl, and add the following code:
#!/usr/bin/perl # safari_rss.pl # Accepts a Safari Search Query and converts a the response to RSS. # Usage: safari_api.pl <Query> # # You can request a developer token, and read the full documentation # for the Safari API at http://safari.oreilly.com/affiliates/ use XML::XSLT; use LWP::Simple; # Set the location of XSL file my $xslfile = "safari_rss.xsl"; # Take the safari search query from the querystring my $search = join(' ', @ARGV) or die "Usage:safari_rss.pl <Query>\n"; # Set some variables for the request my $token = "insert your developer token"; my $sort = "publishingDate"; my $sortorder = "desc"; my $view = "section"; # Assemble the API Request URL $safari_request = "http://safari.oreilly.com/xmlapi/?"; $safari_request .= "token=$token"; $safari_request .= "&search=$search"; $safari_request .= "&sort=$sort"; $safari_request .= "&sortOrder=$sortorder"; $safari_request .= "&view=$view"; # Make the Request my $safari_response = get($safari_request); # Transform the Response my $xslt = XML::XSLT->new ($xslfile, warnings => 1); $xslt->transform ($safari_response); # Print the results # print "Content-type: text/xml\n\n"; print '<?xml version="1.0" ?>'; print $xslt->toString; $xslt->dispose();
This script accepts a query, assembles the appropriate Safari API request URL, and transforms the response with an XSL stylesheet called safari_rss.xsl. The stylesheet itself maps the Safari API response to the appropriate RSS tags:
<?xml version="1.0" ?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" /> <xsl:template match="safari"> <rss version="2.0"> <channel> <title>Safari - Search Results</title> <link>http://safari.oreilly.com/</link> <description>The latest book sections from Safari.</description> <language>en-us</language> <image> <url>http://safari.oreilly.com/portals/oreilly/images/logo.gif</url> <title>Safari Bookshelf</title> <link>http://safari.oreilly.com/</link> </image> <xsl:apply-templates select="book/section"/> </channel> </rss> </xsl:template> <xsl:template match="book/section"> <item> <title> <xsl:value-of select="../title"/>: <xsl:value-of select="title"/> </title> <link><xsl:value-of select="url"/></link> <description> <xsl:value-of select="extract"/> </description> <xsl:apply-templates select="../subjectset/subject"/> </item> </xsl:template> <xsl:template match="subject"> <category><xsl:value-of select="./"/></category> </xsl:template> <xsl:template match="error"> <xsl:value-of select="/"/> </xsl:template> </xsl:stylesheet>
With these two pieces in place, you can call your RSS-generating script, like so:
perl safari_rss.pl insert query
And to specifically create a feed about RSS, I could run the script sending the output to a text file:
perl safari_rss.pl RSS > safari_rss.xml
The final step is setting this script to run on a web server every few days with
cron or the Windows task scheduler. Once up and running, add it to your RSS newsreader of choice. Figure 1 shows what the feed looks like in the newsreader NetNewsWire.
Figure 1. Safari feed in NetNewsWire
Each item title includes the title of a book, and the title of the section that mentions the term "RSS." The item detail is an extract from the section that gives a bit more information. The link takes readers to a page at Safari Books that lets anyone read a bit more from that section, or add the book to a Safari Bookshelf if they have a subscription.
This example just scratches the surface of the kinds of RSS feeds you can put together. Say you're a fan of the Perl function
split. You could put together a feed that shows any new books with code fragments that use the function:
perl safari_rss.pl (CODE "split") AND (CATEGORY=itbooks.prog.perl)
Of course, the Safari API is not just a tool for information junkies like me to stay on top of a previously untouchable mass of data. As with all web services, the power that comes with opening an API is that it lets outsiders move the data in ways the maintainers may not have envisioned--creating new applications, and giving that data new value.
Paul Bausch is a co-creator of the weblog software Blogger, maintains a directory of Oregon-based weblogs at ORblogs.com, and is the author of the forthcoming Yahoo! Hacks.
Return to the O'Reilly Network