Related link: http://zephyrfalcon.org/weblog2/arch_e10_00590.html#e591
Good old Daily Python-URL led me to Hans Nowak’s musings on controlling the operation of generators from the “outside” once they had started running. Interestingly enough, his technique is precisely the same as one I (successfully) experimented with in my recently-announced Scimitar , an ISO Schematron implementation in Python.
Warning: This article is soaked in the inevitably arcane ideas introduced by the exotic code control flow opened up by Python’s generators. I do think these ideas are well worth soaking in if you’d like to harness Python’s full power. Way back when, I was very excited to hear of the inclusion of generators in Python because I had many plans to use them to supercharge XML processing in Python. I’m starting to harvest a good lot of that fruit for the market.
In my own work towards more flexible generators I started with Dave Mertz’s brilliant explorations in his article “Generator-based state machines“. The Mertz technique simulates co-routines (and Mertz has extended the technique to simulate microthreads). This is done by having generators yield a control cookie to a central scheduler in order to indicate which generator is the co-routine to which control should be passed. The generators also yield a global value (which Mertz called “cargo”) that can be seen within the body of subsequently invoked generators. No summary can do justice to this idea. If you haven’t read the article linked earlier in this paragraph, consider opening up a new browser tab or Window and giving it a quick read.
The Mertz technique is certainly sound, but what Hans and what I separately worked out was the refinement of encapsulating the generators in a class, and using instance values to share cargo between generators. This tweak not only makes the code a little cleaner, but it provides some allowance for re-entrance. I also found that in most cases, you can ditch the scheduler, and lose very little of the power of the Mertz technique using only semi-co-routines (roughly code that invokes generators which can invoke further generators).
My experiments led to a pretty powerful mechanisms for taming the notorious complexity of SAX processing. SAX involves having the XML parser call back into user code to handle elements, text, etc., not unlike classic event-driven GUI programming. This typically calls for complex state machines to stich the various snippets of handler code into coherent logic. In parallel to my experiments with generators, I’ve been experimenting with techniques for helping automate (no pun intended) the state machine design. See my latest article, “Decomposition, Process, Recomposition” for an example of how such state machine factories can boost XML processing. The
_state_machine class in listing 2 is the crux.
In Scimitar, rather than making the state machine easier to architect, I went for largely eliminating it through the co-routine approach (as I mentioned, semi-co-routines seem to work just as well in practice). Specifically, I passed SAX events to generators which would then effectively knit the separate call-backs into one smooth run of code, allowing me use local variables and the like to make state management a cinch.
I am very excited by these fresh directions in Pythonic XML processing, and I expect to marry the self-assembling state machine technique with the semi-co-routine technique in future Python/XML columns, and in future releases of running code. I do want to remind readers that Scimitar provides a very practical example of these techniques, implementing a Schematron to Python compiler in less than 500 lines of Python/SAX code. Even if you’re not interested in XML validation using Schematron, have a look at the code and let me know what you think of the basic technique. I already have plans for streamlining it, but that’s for another release.
I must say that from my perspective it’s hard to imagine a more exciting time to be processing XML using Python.
Have you tried co-routines or semi-co-routines in practice?