This is the third in an N part series on rewriting my podcast grabbing application. Here are the links to parts one and two. In part two, I promised to get into a common way of synchronizing media files between media stores.
Delivering on that promise, here is my SyncManager:
from sets import Set
class SyncManager(object):
"""This is a concrete implementation of a syncronization manager which is
intended to be subclassed if necessary.
A SyncManager connects two mediaStores with filters and processing steps.
It should be able to copy files from the fromStore to the toStore, exclude
any files which were filtered out, and execute any processingSteps along
the way.
"""
def __init__(self, fromStore, toStore, copyFilters, deleteFilters, preProcessingSteps, postProcessingSteps):
self.fromStore = fromStore
self.toStore = toStore
self.copyFilters = copyFilters
self.deleteFilters = deleteFilters
self.preProcessingSteps = preProcessingSteps
self.postProcessingSteps = postProcessingSteps
self._init()
def _init(self):
pass
def getCopyList(self):
"""return a list of files we need to copy from the fromStore and to the toStore"""
copySetAll = Set(self.fromStore.list())
copySet = Set([])
for filter in self.copyFilters:
copySet = filter.filter(self.fromStore, self.toStore).union(copySet)
##a poorly written filter could feasibly add things to the list that weren't
##initially there. Use set arithmetic to return all things which the filters
##returned which were also in the original set.
copySet = copySet.intersection(copySetAll)
return copySet
def getDeleteList(self):
"""return a list of files that we need to delete from the toStore"""
deleteSetOriginal = Set(self.toStore.list())
## set deleteSet to a null set - we'll build up a set of what to delete.
deleteSet = Set([])
for filter in self.deleteFilters:
deleteSet = filter.filter(self.fromStore, self.toStore).union(deleteSet)
##a poorly written filter could feasibly add things to the list that weren't
##initially there. Use set arithmetic to return all things which the filters
##returned which were also in the original set.
deleteSet = deleteSet.intersection(deleteSetOriginal)
return deleteSet
def setOverrideCopyList(self, copyList):
pass
def syncCopy(self):
for mediaFile in self.getCopyList():
for preProcessingStep in self.preProcessingSteps:
mediaFile = preProcessingStep.process(mediaFile)
self.toStore.addFile(mediaFile)
for postProcessingStep in self.postProcessingSteps:
mediaFile = postProcessingStep.process(mediaFile)
def syncDelete(self):
pass
def syncAll(self):
##do delete first because that is often nicer to lower-memory
##media devices
self.syncDelete()
self.syncCopy()
This was nearly the final piece to the puzzle for creating a simple automated RSS download manager. I introduced several new concepts to the mix, including copy and delete filters, and pre and post processing steps. The copy filter(s) help determine which files to copy, or more to the point, which files not to copy, from the source media store. I’m not using delete filters yet, but they would determine which files to remove from the source media store when I get the “sync to MP3 player” functionality working.
Processing steps were how I decided to handle the problem of keeping track of which files have been downloaded. This is also how I plan on accommodating changing of certain ID3 tags upon download or sync-to-MP3-player. After copying a file from one media store to another, the post processing steps are called. If you notice in the syncCopy() method, the processing steps, both pre and post, return the mediaFile and re-bind the returned media file to the same name we were using in processing it.
def syncCopy(self):
for mediaFile in self.getCopyList():
for preProcessingStep in self.preProcessingSteps:
mediaFile = preProcessingStep.process(mediaFile)
self.toStore.addFile(mediaFile)
for postProcessingStep in self.postProcessingSteps:
mediaFile = postProcessingStep.process(mediaFile)
This will allow the ability to chain processing steps and manipulate the file with each successive step. I actually had more things in mind that just podgrabber for this. I’m seriously considering removing the virtual filesystem functionality from podgrabber and spinning off another open source project. But one thing at a time, right?
So, we finally have a working podgrabber. Here is a script which will run podgrabber in an unattended way and keep track of what it has downloaded:
from podgrabber import syncManager
from podgrabber import mediaStore
from podgrabber import filter
from podgrabber import processingSteps
import os
dl_base = 'download'
db_filename = 'podgrabber.db'
rss_list = (
##Name, rss url, dl directory
('Buzz Out Loud', 'http://www.cnet.com/i/pod/cnet_buzz.xml', 'buzz'),
('News.com Daily', 'http://news.com.com/2325-11424_3-0.xml', 'news.com'),
)
db_copy_filter = filter.DbCopyFileFilter(db_filename)
update_processing_step = processingSteps.UpdateDBStep(db_filename)
sm_list = []
copy_filters = [db_copy_filter]
delete_filters = []
pre_proc = []
post_proc = [update_processing_step]
for name, url, directory in rss_list:
fromStore = mediaStore.RSSMediaStore(url)
toStore = mediaStore.FileSystemMediaStore(os.path.join(dl_base, directory))
sm = syncManager.SyncManager(fromStore, toStore, copy_filters, delete_filters, pre_proc, post_proc)
copy_list = list(sm.getCopyList())
print copy_list
sm.syncAll()
sm_list.append(sm)
#for f in copy_list[:-3]:
# print 'Updating', f
# update_processing_step.process(f)
This is a working, but limited script. There is no GUI. There are no status updates on how far along the download has progressed. This is a totally single threaded process. The single threadedness is probably the biggest stinker among the limitations. I can either handle threading at the very top level and spin off each syncManager into its own thread, which would be the easy way, or introduce something like a syncTaskManager which the syncManagers would pass off a task to and it would manage a thread pool and execution of those tasks. Hmmmm….a task manager isn’t a half bad idea, come to think of it.
OK - so for next time, I’ll either start re-implementing the GUI for podgrabber, or I’ll introduce a task manager for nicer, cleaner threading. What am I talking about? I don’t even have threading yet!

