In my last post on the topic of rewriting my podgrabber utility, I promised to post the rewrite-code-in-progress to a Bazaar repository. You can branch from here if you’re interested. In this post, I’m going to discuss the paradigm I’m following for getting files from a webserver, pulling them onto a computer, then onto a portable media device.

In the current version of podgrabber, there was a concept of a download manager which would take a URL and save the file to a particular directory. This download manager was built with a small amount of extensibility in a very clunky way. I looked at the URL in order to determine how to download the file. After getting the files from the webserver to my computer, a single function would synchronize files between my computer and my portable media device.

This approach works, but it doesn’t provide a cohesive approach to the problems. It also isn’t very extensible. In order to come up with new file sources (such as FTP) would probably involve a lot of cut and paste and an ever-growing download method. And synchronizing downloaded files to anything other than some MP3 player that shows up as a USB disk drive would prove quite painful.

As I was contemplating rewriting podgrabber, it occurred to me that downloading files from a webserver and getting files onto an MP3 player really only presented one unified problem: moving files from one place to another. So, we have some source of media files (let’s call it a media store) which knows which files it has available, can perhaps delete some of those files, and can perhaps add new files to itself. Examples of media stores are RSS feeds, a directory of downloaded files on a computer, and a group of media files on a portable media device. Here is the code I have so far for an RSS media store and a hard disk based media store::

import urllib
import mediaFile
import os
import xml.parsers.expat
from elementtree import ElementTree

class ListFailed(Exception):
    """Exception raised if listing files in a mediaStore fails"""
    pass

class DeleteFailed(Exception):
    """Exception raised if deleting a file in a mediaStore fails"""
    pass

class AddFailed(Exception):
    """Exception raised if copying a file to a mediaStore fails"""
    pass

class NoPath(Exception):
    """Exception raised if no getStorePath method has been implemented"""
    pass

class UnspecifiedMediaFileType(Exception):
    """Exception raised if a mediaFile type hasn't been specified"""
    pass

class IMediaStore:
    def getMediaFileType(self):
        """return the class of media file this mediaStore houses"""
        raise UnspecifiedMediaFileType
    def list(self):
        """return a list of mediaFile objects in this mediaStore"""
        raise ListFailed
    def deleteFile(self, fileName):
        """remove a file from this mediaStore"""
        raise DeleteFailed
    def getStorePath(self):
        """return the prefix this store has been configured with"""
        raise NoPath
    def addFile(self, mediaFile):
        """copy specified mediaFile into this mediaStore

        This method breaks a pure 'interface' approach by actually
        implementing the functionality.  This addMethod implementation
        should be the same for any mediaStore, so everyone should just
        use this 'interface'."""
        outfile = self.getMediaFileType()(os.path.join(self.getStorePath(), mediaFile.getFileName()), "w")
        while 1:
            chunk = mediaFile.read(8 * 1024)
            if not chunk:
                break
            outfile.write(chunk)
        outfile.finalizeWrite()

class RSSMediaStore(IMediaStore):
    def __init__(self, feedUrl, proxy=None):
        self.feedUrl = feedUrl
        if proxy is None: self.proxy = {}
        else: self.proxy = proxy
        self.mediaFileType = mediaFile.HTTPFile

    def getStorePath(self):
        return self.feedUrl

    def getMediaFileType(self):
        return self.mediaFileType

    def list(self):
        opener = urllib.FancyURLopener(self.proxy)
        f = opener.open(self.feedUrl)
        feed_text = f.read()
        try:
            feed_tree = ElementTree.fromstring(feed_text)
        except xml.parsers.expat.ExpatError:
            return []
        item_list = [mediaFile.HTTPFile(i.find("enclosure").attrib.get("url", "No URL")) for i in feed_tree.findall("*/item") if i.findall("enclosure")]
        return item_list

class FileSystemMediaStore(IMediaStore):
    def __init__(self, directory):
        self.directory = directory
        self.mediaFileType = mediaFile.DiskFile

    def getStorePath(self):
        return self.directory

    def getMediaFileType(self):
        return self.mediaFileType

    def list(self):
        files = [self.mediaFileType(f) for f in [os.path.join(self.directory, ff) for ff in os.listdir(self.directory)] if os.path.isfile(f)]
        return files

class MTPMediaStore(IMediaStore):
    pass

I’ve created an interface class called IMediaStore mostly to be able to keep track of what I want this thing to do. I felt like I was coding in Java by feeling like I had to put common methods in the IMediaStore class, but this really helps to keep things straight.

In contrast to the previous generation of podgrabber, I now have nice chunks of functionality which are logically separated and aren’t going to get overly complex. Both the filesystem based media store and the RSS media store know how to get a list of files they contain and return that list.

Which brings me to the types of files themselves. I expect the files that I’m dealing with to act something like “real” files on a filesystem. Using urllib can make things go a little easier for. But I know when I start creating an implementation for my MTP-based Creative Zen Vision W, I’m going to run into some difficulty. So I created a media file interface and implemented an HTTP file and a filesystem file::

import urllib
import re
import os

class LocalFilePathDoesNotExist(Exception):
    """Exception raised by MediaFile objects when there is either no file on disk for this media
    file or the object just doesn't know about it."""
    pass

class FileNotReadable(Exception):
    """Exception raised by MediaFile objects when the file is not readable"""
    pass

class FileNotWritable(Exception):
    """Exception raised by MediaFile objects when the file is not writable"""
    pass

class IMediaFile:
    """MediaFiles may be files actually located on a hard drive, on a web
    server, or on a portable media device.  This interface describes what can
    be done with media files in order to copy them from one medium to another.

    In order to copy from a MediaStore, the MediaFile must either provide an
    implementation for read or getLocalFilePath.  read allows us to copy bytes
    of data from one location to another.  getLocalFilePath allows us to either
    open the actual file in read mode or use shutil.copyfile.  In order to copy
    to a MediaStore, the MediaFile must either provide an implementation for
    write or getLocalFilePath.  write allows us to copy bytes directly to the
    file.  getLocalFilePath allows us to either use shutil.copyfile or open the
    local file in write mode and copy bytes in.

    In the case of a media device which we don't have a filesystem interface to
    copy files to or the ability to write bytes directly to (such as an MTP
    device), finalizeWrite can come in handy.  One strategy is to write the file
    to a temporary location using either write or shutil.copyfile (after
    determining the tmp file's location with getLocalFilePath) and upon the call
    to finalizeWrite, we can copy the file to the device.  In the case of MTP
    devices, there are a number of mtp-* utilities which can copy files on a
    filesystem to the MTP device.

    """
    def __init__(self, location, mode="r", **kw):
        self.location = location
        self.mode = mode
        self.__dict__.update(kw) ##is this a kludge?  maybe.
        self._init()
    def read(self, bytes=None):
        raise FileNotReadable
    def write(self):
        raise FileNotWritable
    def getLocalFilePath(self):
        raise LocalFilePathDoesNotExist
    def getFileLocation(self):
        return self.location
    def finalizeWrite(self):
        raise FileNotWritable
    def getBytesRead(self):
        raise FileNotReadable
    def getBytesWritten(self):
        raise FileNotWritable
    def getFileName(self):
        raise FileNotWritable
    def _init(self):
        pass

test_quote = """['\"]"""
attach_re = re.compile('''^(attachment|inline);s*filenames*=s*''' + test_quote + '''(.*?)''' + test_quote)

class HTTPFile(IMediaFile):
    """HTTP file
    """
    def __str__(self):
        try:
            filePath = self.getLocalFilePath()
        except LocalFilePathDoesNotExist:
            filePath = "No File Path Found"
        return "" % filePath

    def __repr__(self):
        try:
            filePath = self.getLocalFilePath()
        except LocalFilePathDoesNotExist:
            filePath = "No File Path Found"
        return "" % filePath

    def _init(self):
        self.bytes_read = 0
        try:
            proxy = self.proxy
        except AttributeError:
            proxy = {}
        self.opener = urllib.FancyURLopener(proxy)
        self.opener_file = self.opener.open(self.location)

        self.filename = os.path.basename(self.opener_file.url)
        headers = self.opener_file.headers
        for key, val in headers.items():
            #print key,val
            #kludge to get the filename out of the MIME contents
            if (key == "content-disposition") and ((val.startswith("attachment;")) or (val.startswith("inline;"))):
                attach_match = attach_re.match(val)
                if attach_match:
                    self.filename = attach_match.groups()[1]

    def read(self, bytes=None):
        if bytes:
            chunk = self.opener_file.read(bytes)
        else:
            chunk = self.opener_file.read()
        self.bytes_read += len(chunk)
        return chunk

    def getBytesRead(self):
        return self.bytes_read

    def getFileName(self):
        return self.filename

    def getLocalFilePath(self):
        return self.location

class DiskFile(IMediaFile):
    def __str__(self):
        try:
            filePath = self.getLocalFilePath()
        except LocalFilePathDoesNotExist:
            filePath = "No File Path Found"
        return "" % filePath

    def __repr__(self):
        try:
            filePath = self.getLocalFilePath()
        except LocalFilePathDoesNotExist:
            filePath = "No File Path Found"
        return "" % filePath

    def _init(self):
        self.bytes_read = 0
        self.bytes_written = 0
        self.diskFile = open(self.location, self.mode)
        self.filename = os.path.basename(self.location)

    def __del__(self):
        self.diskFile.close()

    def read(self, bytes=None):
        if bytes:
            chunk = self.diskFile.read(bytes)
        else:
            chunk = self.diskFile.read()
        self.bytes_read += len(chunk)
        return chunk

    def write(self, chunk):
        retVal = self.diskFile.write(chunk)
        self.bytes_written += len(chunk)
        return retVal

    def getBytesRead(self):
        return self.bytes_read

    def getBytesWritten(self):
        return self.bytes_written

    def getFileName(self):
        return self.filename

    def getLocalFilePath(self):
        return self.location

    def finalizeWrite(self):
        pass

class MTPFile(IMediaFile):
    pass

In order to move the files onto my Zen, I’ll have to flesh out the MTPFile class. (In case you don’t know, MTP is Microsoft’s “media transfer protocol”.) There are a number of command line utilities which I can wrap in order to interact with an MTPdevice. There is also a library which would allow me to get closer to the metal, but unfortunately there are no Python bindings for the MTPlibrary. So, I will either wrap the command line utilities or swig the library.

Given this foundation, the following code will take the first 10 items from CNet’s “Buzz out Loud” podcast and store them in my “/home/jmjones/mediaStoreTest”directory::

import mediaStore

sourceStore = mediaStore.RSSMediaStore("http://www.cnet.com/i/pod/cnet_buzz.xml")
destStore = mediaStore.FileSystemMediaStore("/home/jmjones/mediaStoreTest")

for f in sourceStore.list()[:10]:
    destStore.addFile(f)

This was a quick tour of my file handling code. Next time, I’ll get into a common way of synchronizing files between two media stores.