advertisement

Weblog:   The Python comunity has too many deceptive XML benchmarks
Subject:   "There's a Riot Goin' On"
Date:   2005-01-24 17:06:26
From:   uche
Sly Stone said it.


Sure, I can accept that writing a proper test harness within Python is a better way to time it. And how useful is it that we can even have this discussion, since I didn't make a mystery of my benchmarking technique. And any adjustments are easy, since I didn't make a mystery of my code.


The ensuing discussion is somewhat along the lines I suggested in the article (and it has such color and character to go with it). But crucially, the color interferes with any understanding that there is a lot more to test (I gave examples), and many more ways to test it before we have MIPS-wars quality benchmarks.


All the effbot bluster in the world does not change the fact that benchmarking requires transparency, which has been completely missing from the Python/XML gorilla match until today. And it doesn't change the fact that his benchmarks are useless, essentially measuring conditions completely alien to anyone's actual use.


So effbot has useless benchmarks, and argues that I also now have useless benchmarks. Nowhere to go from there but up.

Full Threads Oldest First

Showing messages 1 through 6 of 6.

  • "There's a Riot Goin' On"
    2005-01-25 08:55:58  ialbert [Reply | View]

    So effbot has useless benchmarks, and argues that I also now have useless benchmarks. Nowhere to go from there but up.


    What kind of excuse is that?

    You're the one that brought up the whole thing yet it seems that you have done a worse job at becnhmarking than others. Very ironic.

    I think your benchmarking method is very ad-hoc and you'd be better served if you fixed the glaring errors and posted an updated version of your findings.

    I'm getting incomparably better results with cElementtree (runing the same program as you do but I'm benchmarking it with timeit, around 0.25 seconds/run) on a similar laptop. Could not test your framework since your FTP system is down.

    • "There's a Riot Goin' On"
      2005-01-25 13:45:13  oreillyuser [Reply | View]

      "What kind of excuse is that?"

      Great non-point, Istvan Albert.

      There are two points I believe Uche made that have not been addressed despite all of Fredrik Lundh's (effbot) blustering here, on his blog, and on his pythonware daily site. One is that Fredrik's benchmark is pretty useless because it just loads an XML file into a data structure but does nothing significant with it. Two is that Fredrik's useless benchmarks give the misleading impression then that some other XML tools are much horribly slower than they really are, when really most of the XML tools are quite comparable to one another speed-wise, and some of them are even better when you consider other issues like how easy they are to use. And really, since this is Python, ease of use is of primary importance. celementtree may or may not be the fastest, but I don't believe it is the easiest to use or install.
      • "There's a Riot Goin' On"
        2005-01-25 15:02:30  effbot [Reply | View]

        So the only supporter Uche can bring up posts anonymously, repeats Uche's nonsense, and uses exactly the same words, style and phrasing as Uche himself. Cute.

        (as for your so-called arguments, some hints: for three processes that run in sequence, the total time is A+B+C, not max(A, B, C). if you set A to zero, the total will drop. second, how hard is it to "click on installer" or type "python setup.py install". thousands of people have already done it. I'm sure you can do it to, if you try. feel free to mail me if you need help.)
        • "There's a Riot Goin' On"
          2005-01-27 07:28:04  huh?? [Reply | View]

          You later wrote: ""... I just noted that someone was repeating uche's arguments using Uche's words, with very little additional processing. I expect people to do a little more research before spouting off...""

          You must have not read what I wrote at all. Did you see the first sentence in my paragraph? "There are two points I believe Uche made that have not been addressed..."

          And then you complain that I was repeating Uche's arguments??? I was repeating Uche's arguments because....I was repeating Uche's arguments! Which you still have not addressed. Are you for real?
        • "There's a Riot Goin' On"
          2005-01-26 07:34:04  huh?? [Reply | View]

          "as for your so-called arguments, some hints: ..."

          I don't what kind of argumentation strategy you are trying now (ad-hominem followed by red herring?), but that has zero to do with the two points I mentioned. I just want to see a more rigorous and open benchmark used, and see how the different tools compare when it comes to ease of use. I'm sure your celementree is fast, but it doesn't look like the easiest and most pythonic to use, however.
  • "There's a Riot Goin' On"
    2005-01-25 03:07:14  faassen [Reply | View]

    I wouldn't say the speed of parsing XML into a Pythonic datastructure is completely alien to people's use. It can be done a lot more slowly, as has been shown in the past over and over, and cElementTree can do it very quickly.

    That means we can now be far less concerned with parsing overhead. Since the structure is already Python-style, the overhead of ElementTree API calls can then be minimal, as is shown by the fast performance of the find operation in ElementTree. Non-C ElementTree find() sometimes can even beat libxml2 XPath, which is implemented in C.

    lxml.etree can do a parse very quickly too, using the underlying libxml2 library. Unfortunately it isn't "done" yet then if you want to use the ElementTree API, are there are Python proxies to be produced while the user accesses the XML. This has been made fairly fast by now, but it still lags behind ElementTree. For libxml2 native xpath this proxy overhead is far less, and you can get down to busines right away.

    If you want to know how I know all this, see my blog for a lot of benchmarking over the last couple of weeks. I didn't have a 'begat' test yet, but I did test a simple //v test, as Uche did in an earlier article.

Showing messages 1 through 6 of 6.