advertisement

Weblog:   The Python comunity has too many deceptive XML benchmarks
Subject:   "There's a Riot Goin' On"
Date:   2005-01-25 03:07:14
From:   faassen
Response to: "There's a Riot Goin' On"

I wouldn't say the speed of parsing XML into a Pythonic datastructure is completely alien to people's use. It can be done a lot more slowly, as has been shown in the past over and over, and cElementTree can do it very quickly.


That means we can now be far less concerned with parsing overhead. Since the structure is already Python-style, the overhead of ElementTree API calls can then be minimal, as is shown by the fast performance of the find operation in ElementTree. Non-C ElementTree find() sometimes can even beat libxml2 XPath, which is implemented in C.


lxml.etree can do a parse very quickly too, using the underlying libxml2 library. Unfortunately it isn't "done" yet then if you want to use the ElementTree API, are there are Python proxies to be produced while the user accesses the XML. This has been made fairly fast by now, but it still lags behind ElementTree. For libxml2 native xpath this proxy overhead is far less, and you can get down to busines right away.

If you want to know how I know all this, see my blog for a lot of benchmarking over the last couple of weeks. I didn't have a 'begat' test yet, but I did test a simple //v test, as Uche did in an earlier article.