...is definitely not relevant to a benchmark, unless you're trying to compare Python to some other solution.
At the very least, you should break out the statistics into startup time, module import time, and actual run time of whatever function represents the program's functionality. Avoiding console output would be a good idea too.
These things are pretty basic to benchmarking any tool for any language -- and I don't have any axe to grind about these tools; I actively try to avoid XML as much as possible anyway. :)