I'm working on a on-line bookstore project, which I'd like to automate as much as possible. Things like book insertion and deletion ought to happen automatically, with only occasional need to mess with code or HTML. My choice of language for this project is Python, with a touch of urllib and re. Since one of the sources of book titles and ISBN numbers I use is the O'Reilly book catalogue, I thought I'd share this little script with other ORA fans.
#!/usr/bin/python
import urllib, re, sys
try:
page = urllib.urlopen('http://www.oreilly.com/catalog/prdindex.html')
except IOError, (errno, strerror):
sys.exit ("I/O error(%s): %s" % (errno, strerror))
title = ""
isbn = ""
price = ""
page = page.read()
page = page.replace("\n", "")
page = page.replace("\r", "")
page = page.replace("> ", ">")
page = page.replace(" ", " ")
page = page[page.find("<b>Examples</b></td>") + len("<b>Examples</b></td>"):]
while(1):
page = page[page.find("<tr ") + len("<tr "):]
if (len(page) == 1):
break
page = page[page.find("http://www.oreilly.com/catalog/"):]
if (len(page) == 1):
break
page = page[page.find("\">"):]
page = page[2:]
title = page[:page.find("</a>")]
page = page[page.find("\">"):]
page = page[2:]
isbn = page[:page.find("</td>")]
isbn = isbn.replace("-", "")
page = page[page.find("\">"):]
page = page[2:]
price = page[:page.find("</td>")]
print title + ":" + isbn + ":" + price
Jacek Artymiak started his adventure with computers in 1986 with Sinclair ZX Spectrum. He's been using various commercial and Open Source Unix systems since 1991. Today, Jacek runs devGuide.net, writes and teaches about Open Source software and security, and tries to make things happen.
oreillynet.com Copyright © 2006 O'Reilly Media, Inc.