Related link: http://www.nelson.monkey.org/~nelson/weblog/tech/python/xpath2.html

I designed Amara XML Toolkit to make the simple things easy and the complex things possible. I’m open to honest, constructive criticism of where I failed in that aim, but I don’t want any misconceptions floating out there.

Cutting to the high-speed chase scene, here is how Nelson Minar can do what he wants in Amara:

from amara import binderytools
doc = binderytools.bind_file("foo.opml")

for outline in doc.xpath("//outline"):
    print outline.xmlUrl

If someone thinks that’s too complex, I’ll be happy to hear ideas of how to make it simpler. It’s 4 lines of code that’s very similar code to the ElementTree example. In my previous blog I went on the impression that Nelson really wanted to use XPath in attributes, so I showed how to make that possible in Amara. He somehow misinterpreted that, implying that throwing in such a rule is the only way to parse a document in Amara.

In reality, 90% of Amara users will never need to invoke a special rule while parsing XML. The defaults are generally fine, tuned for speed/space versus functionality.

Amara does let you turn on and off custom behaviors with simple declarative rules, and it lets you tune those rules to be applicable to just portions of a document. I think this is a good way to save users a lot of code. Yes, the downside is that you have to learn the available rules, but that is inevitable, and I’ve always thought it’s easier to read a documentation on an existing capability than to write code to reinvent it.

But as I always say, code speaks louder than words, so here is more. Above I challenged folks to show how they could make the Amara bindery example simpler. Well, in my last release of Amara I decided to take on that challenge myself. Amara 0.9.2 introduces the Pushbind. With Pushbind, here is code that does what Nelson wants:

from amara import binderytools
for frag in binderytools.pushbind('outline',source='foo.opml'):
    print frag.outline.xmlUrl

There you go. One fewer line, and the XML looks to all observation like just any other Python object coming in from an iterator. One nice bonus is that it is extremely memory efficient. In fact, it never uses much more memory, in general, than it takes to represent one outline element. This is true whether foo.opml is 1KB or 1MB.

As an illustration for general users, the following code prints all verses containing the word ‘begat’
Jon Bosak’s Old Testament in XML, a 3.3MB document, again without ever needing to have the entire document in memory (although there is always the possibility that the loop will outrun Python’s garbage collector).

from amara import binderytools
for frag in binderytools.pushbind('v',source='ot.xml'):
    text = unicode(frag.v)
    if text.find('begat') != -1:
        print text.encode('utf-8') #There's some non-ASCII in ot.xml

I personally think that Pushbind handles just about any of the cases that make people turn to SAX.