Related link: http://www.nelson.monkey.org/~nelson/weblog/tech/python/xpath.html

First of all, here are the three snippets Nelson posted:

PyXML

from xml.dom.ext.reader import Sax2
from xml import xpath
doc = Sax2.FromXmlFile('foo.opml').documentElement
for url in xpath.Evaluate('//@xmlUrl', doc):
  print url.value

My take: this uses the ancient 4DOM code. I expect it to be slow as hell and suck all the memory out of your computer. People, avoid the line from xml.dom.ext.reader import Sax2 like the plague. If there are docs that still suggest it, they really should be fixed. If you do use PyXML, use minidom, but I personally have not been much of an advocate of PyXML in ages.

libxml2

import libxml2
doc = libxml2.parseFile('foo.opml')
for url in doc.xpathEval('//@xmlUrl'):
  print url.content

My take: as Nelson admits this snippet is very deceptive. It doesn’t show even a fraction of the hair-pulling that would characterize a real-world version of the same code. It ignores the fact that libxml2 forces you to do your own memory management, that it requires very hideous C-ish idioms to work through the XPath results, etc.

ElementTree

from elementtree import ElementTree
tree = ElementTree.parse("foo.opml")
for outline in tree.findall("//outline"):
  print outline.get('xmlUrl')

My take: ElementTree is always a breath of fresh air, but Nelson mentions that he was hampered by the XPath limitations (no attribute axis, for example). Well, there is always some cost to max simplicity, max performance.

And out of my corner are the following offerings.

4Suite:

from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseUri("foo.opml")
for url in doc.xpath("//@xmlUrl"):
    print url.value

Here you have 100% of XPath’s power, plus the option to extend XPath in Python, if need be. It’s also plenty fast these days, if not quite as fast as libxml2, and probably not as fast as cElementTree.

Amara

from amara import binderytools
rule = binderytools.preserve_attribute_details(u'*')
doc = binderytools.bind_file("foo.opml", rules=[rule])

for url in doc.xpath("//@xmlUrl"):
    print url.value

Looks very similar to the 4Suite example besides the imports and the declared rule. Amara does not support XPath attributes by default (to save space, similar, I’d guess, to the reasoning in ElementTree), but you can trivially enable them by asserting the above rule. 4Suite has no such limitations, but Amara’s edge is more clearly shown if you’re not using XPath. For example, Amara would allow you to access an XHTML title easily, without needing XPath: print doc.html.head.title. This is what I mean by extreme Python-friendliness. I should point out, though, that Amara’s XPath implementation does have some other limitations, but not any most users are likely to run into.

Got code of your own?