I’m working on a on-line bookstore project, which I’d like to automate as much as possible. Things like book insertion and deletion ought to happen automatically, with only occasional need to mess with code or HTML. My choice of language for this project is Python, with a touch of urllib and re. Since one of the sources of book titles and ISBN numbers I use is the O’Reilly book catalogue, I thought I’d share this little script with other ORA fans.
#!/usr/bin/python
import urllib, re, sys
try:
page = urllib.urlopen('http://www.oreilly.com/catalog/prdindex.html')
except IOError, (errno, strerror):
sys.exit ("I/O error(%s): %s" % (errno, strerror))
title = ""
isbn = ""
price = ""
page = page.read()
page = page.replace("n", "")
page = page.replace("r", "")
page = page.replace("> ", ">")
page = page.replace(" ", " ")
page = page[page.find("<b>Examples</b></td>") + len("<b>Examples</b></td>"):]
while(1):
page = page[page.find("<tr ") + len("<tr "):]
if (len(page) == 1):
break
page = page[page.find("http://www.oreilly.com/catalog/"):]
if (len(page) == 1):
break
page = page[page.find("\">"):]
page = page[2:]
title = page[:page.find("</a>")]
page = page[page.find("\">"):]
page = page[2:]
isbn = page[:page.find("</td>")]
isbn = isbn.replace("-", "")
page = page[page.find("\">"):]
page = page[2:]
price = page[:page.find("</td>")]
print title + ":" + isbn + ":" + price

