I’ve spent several hours optimizing Parrot over the past few months. In particular, I’ve concentrated on the build process for Rakudo (Perl 6 on Parrot), as it exercises a lot of parts of Parrot. We don’t yet have accurate numbers on the improvements, but rough figures show that the parts of the build process I’ve optimized will be about twice as fast as they were three months ago, despite Rakudo having grown tremendously since then.

Some of this comes from luck, some comes from a deepening knowledge of Parrot internals, a lot of it comes thanks to Callgrind and KCacheGrind, and some of it is experience. My instincts are improving.

Despite some very deliberate differences between Parrot and the Perl 5 implementation, there are also some similarities. In particular, the default runcore for both virtual machines is very similar. For every operation performed (that is, every logical operation expressed in Perl 5 or PIR source code), the default runcore dispatches to a C function which performs the operation and returns the next op.

The default Perl 5 runloop checks for pending signal delivery after each op, before looping again.

One of the best optimization strategies is to understand what happens frequently and what happens infrequently, and to try to make infrequent things mostly free — or at least cheap. Signal delivery in Perl 5 is pretty infrequent.

I reasoned, without profiling, that eliminating the check for signal delivery might provide a small performance improvement to Perl 5 code. (You can’t eliminate signal delivery overall without removing features, but I had the notion that it’s possible to write code which runs when installing a signal handler — there are hooks for this — and replaces the default “Don’t check signals” runcore with a runcore which does check signals. Perl Hacks demonstrates how to replace runloops (see also Runops::Trace, so it’s doable.)

I updated my copy of bleadperl, built a fresh version, and then looked for some long-running code to profile. t/op/pack.t looked likely — it’s the second largest test file of a Perl operator, and it’s not heavily tied to regex performance as is the largest test file. I ran it through callgrind.

30 seconds later, I had some performance data (amended):

Profiled target:  ./perl -Ilib t/op/pack.t (PID 4285, part 1)

770,891,351  PROGRAM TOTALS

--------------------------------------------------------------------------------
        Ir  file:function
--------------------------------------------------------------------------------
38,060,996  ???:Perl_sv_setsv_flags [/home/chromatic/dev/bleadperl/perl]
30,148,727  hooks.c:mem2chunk_check [/usr/lib/debug/libc-2.6.1.so]
28,032,914  ???:Perl_sv_upgrade [/home/chromatic/dev/bleadperl/perl]
25,980,923  malloc.c:_int_malloc [/usr/lib/debug/libc-2.6.1.so]
21,486,973  memmove.c:memmove [/usr/lib/debug/libc-2.6.1.so]
20,980,121  ???:Perl_runops_standard [/home/chromatic/dev/bleadperl/perl]

I edited Perl_runops_standard in run.c, removing the emboldened line:

int
Perl_runops_standard(pTHX)
{
    dVAR;
    while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
        PERL_ASYNC_CHECK();
    }

    TAINT_NOT;
    return 0;
}

… rebuilt, and re-ran the benchmark:

Profiled target:  ./perl -Ilib t/op/pack.t (PID 4367, part 1)
762,975,114  PROGRAM TOTALS

38,060,996  ???:Perl_sv_setsv_flags [/home/chromatic/dev/bleadperl/perl]
30,148,727  hooks.c:mem2chunk_check [/usr/lib/debug/libc-2.6.1.so]
28,032,914  ???:Perl_sv_upgrade [/home/chromatic/dev/bleadperl/perl]
25,980,923  malloc.c:_int_malloc [/usr/lib/debug/libc-2.6.1.so]
21,486,973  memmove.c:memmove [/usr/lib/debug/libc-2.6.1.so]
...
13,114,584  ???:Perl_runops_standard [/home/chromatic/dev/bleadperl/perl]

That’s a 1.03% performance improvement — almost statistical noise. Over five or six percent, it might have been worth considering. It looks like this optimization isn’t worth the work it would take to figure out runloop swapping.

It’s not a bad investment of half an hour, but performance improvements in Perl will have to come from somewhere else. In particular, the 425-line monster Perl_sv_setsv_flags looks likely….