advertisement

Weblog:   The Python comunity has too many deceptive XML benchmarks
Subject:   "There's a Riot Goin' On"
Date:   2005-01-24 17:06:26
From:   uche
Sly Stone said it.


Sure, I can accept that writing a proper test harness within Python is a better way to time it. And how useful is it that we can even have this discussion, since I didn't make a mystery of my benchmarking technique. And any adjustments are easy, since I didn't make a mystery of my code.


The ensuing discussion is somewhat along the lines I suggested in the article (and it has such color and character to go with it). But crucially, the color interferes with any understanding that there is a lot more to test (I gave examples), and many more ways to test it before we have MIPS-wars quality benchmarks.


All the effbot bluster in the world does not change the fact that benchmarking requires transparency, which has been completely missing from the Python/XML gorilla match until today. And it doesn't change the fact that his benchmarks are useless, essentially measuring conditions completely alien to anyone's actual use.


So effbot has useless benchmarks, and argues that I also now have useless benchmarks. Nowhere to go from there but up.

Main Topics Oldest First

Showing messages 1 through 2 of 2.

  • "There's a Riot Goin' On"
    2005-01-25 08:55:58  ialbert [Reply | View]

    So effbot has useless benchmarks, and argues that I also now have useless benchmarks. Nowhere to go from there but up.


    What kind of excuse is that?

    You're the one that brought up the whole thing yet it seems that you have done a worse job at becnhmarking than others. Very ironic.

    I think your benchmarking method is very ad-hoc and you'd be better served if you fixed the glaring errors and posted an updated version of your findings.

    I'm getting incomparably better results with cElementtree (runing the same program as you do but I'm benchmarking it with timeit, around 0.25 seconds/run) on a similar laptop. Could not test your framework since your FTP system is down.

  • "There's a Riot Goin' On"
    2005-01-25 03:07:14  faassen [Reply | View]

    I wouldn't say the speed of parsing XML into a Pythonic datastructure is completely alien to people's use. It can be done a lot more slowly, as has been shown in the past over and over, and cElementTree can do it very quickly.

    That means we can now be far less concerned with parsing overhead. Since the structure is already Python-style, the overhead of ElementTree API calls can then be minimal, as is shown by the fast performance of the find operation in ElementTree. Non-C ElementTree find() sometimes can even beat libxml2 XPath, which is implemented in C.

    lxml.etree can do a parse very quickly too, using the underlying libxml2 library. Unfortunately it isn't "done" yet then if you want to use the ElementTree API, are there are Python proxies to be produced while the user accesses the XML. This has been made fairly fast by now, but it still lags behind ElementTree. For libxml2 native xpath this proxy overhead is far less, and you can get down to busines right away.

    If you want to know how I know all this, see my blog for a lot of benchmarking over the last couple of weeks. I didn't have a 'begat' test yet, but I did test a simple //v test, as Uche did in an earlier article.

Showing messages 1 through 2 of 2.