Working with complex, nested data structures takes practice and patience. It helps to be able to visualize your data. Data::Dumper is one of the oldest and most widely used modules because it does what it says – it serializes a Perl data structure to its equivalent Perl code.

It’s not a perfect module, though. Its default output is a little verbose (if customizable), it can use a lot of memory, and it can be slow. It also doesn’t handle complex references well.

Data::Dump::Streamer is a newer alternative that works better in some cases. Here’s what I learned from playing with it one afternoon.

Inside DDS

The API is similar, but not exactly equivalent, to the Data::Dumper interface. In particular, there’s a separate analysis and output phase. The Dumper() equivalent seems to be:

  print Dump( $some_var )->Out();

You must always call Dump() or some alternative before Out().

Don’t Lose Lexical Names

For the simple case, the two modules are roughly equivalent. There are some other nice features, too. DumpLex() is very handy if you have PadWalker installed:

  use Data::Dump::Streamer;

  my $some_hash = { foo => 1, bar => 2, baz => 3 };
  print DumpLex( $some_hash )->Out();

… produces:

  $some_hash = {
                 bar => 2,
                 baz => 3,
                 foo => 1
               };

This doesn’t always work though; you have to have access to the pad:

  use Data::Dump::Streamer;

  {
      my %closed_over;

      sub foo
      {
          $closed_over{foo}++;
      }

      sub bar
      {
          $closed_over{bar}++;
      }

      sub get_co
      {
          return \%closed_over;
      }
  }

  foo();
  bar();
  print DumpLex( get_co() )->Out();

… prints:

  Use of uninitialized value in substitution (s///)...
  Use of uninitialized value in substitution (s///)...
  $HASH1 = {
             bar => 1,
             foo => 1
           };

That is, if you’re outside of the scope of the variable, DDS can’t (easily or reliably) get the lexical’s name.

If you want to serialize the code with something slightly better than the global variable case (do somefile.pl), use the Declare() method to declare lexicals:

  use Data::Dump::Streamer;

  my ($x, $y);
  ($x, $y) = \($y, $x);

  print Dump( $x, $y )->Declare( 1 )->Out();

… produces:

  my $REF1 = 'R: $REF2';
  my $REF2 = \$REF1;
  $REF1 = \$REF2;

(This is probably more useful than this example makes it seem.)

Peek Inside Subroutine References

One of the nicest features is that dumping objects containing closures works:

  use Data::Dump::Streamer;

  my %closed_over;
  my %held_subs = 
  (
      foo => sub { $closed_over{foo}++ },
      bar => sub { $closed_over{bar}++ },
  );

  my $object = bless \%held_subs, 'Some::Class';

  print DumpLex( $object )->Out();

… produces:

  my (%closed_over);
  %closed_over = ();
  $object = bless( {
              bar => sub {
                       use warnings;
                       use strict 'refs';
                       $closed_over{'bar'}++;
                     },
              foo => sub {
                       use warnings;
                       use strict 'refs';
                       $closed_over{'foo'}++;
                     }
            }, 'Some::Class' );

Yes, that does imply that dumping subroutine references works too. (The extra use lines in the dumped subroutines come from B::Deparse, not Data::Dump::Streamer.)

Trying to Break Things

Arguably, DDS handles a few pathological cases better than Data::Dumper:

  use Data::Dumper;
  use Data::Dump::Streamer;

  my ($x, $y);
  ($x, $y) = \($y, $x);

  print Dumper( $x, $y );

  print "\n";

  print Dump( $x, $y )->Out();

… produces:

  $VAR1 = \\$VAR1;
  $VAR2 = ${$VAR1};

  $REF1 = \$REF2;
  $REF2 = \$REF1;

When and Why Streaming Matters

One of my biggest frustrations with Data::Dumper is that it builds the entire serialized string in memory first before writing it. That can take a while. I don’t have a good example of this, but here’s a test program that builds a deep data structure and serializes it.

  use Data::Dumper;
  use Data::Dump::Streamer;

  my $data = {};
  my $top  = $data;

  for ( 1 .. 5000 )
  {
      $data = $data->{foo} = {};
  }

  print Dumper( $top );
  # print DumpLex( $top )->Out;

I ran this a couple of times with 1000 iterations and a couple of times with 5000 iterations. (I also redirected STDOUT to /dev/null to remove some of the IO timing.) This is not scientific and barely a benchmark, but the results are interesting.

With the 1000-level hash reference, Data::Dumper finished in under a second, while Data::Dump::Streamer took just over two seconds. For the larger hash reference, DDS took around 14 seconds, while Data::Dumper took between 48 and 60 seconds. It also used around twice as much memory, at least according to top (both virtual and resident).

I don’t deal with data structures that complex very often (and the overhead of that much IO probably matches the overhead of visiting such data structures), but the other convenience features of DDS make it compelling.

A better benchmark might also test the latency of requests — that is, I expect DDS to start producing output sooner, which can be important in some contexts — say, web programming. (I ran the test again without redirecting the output. I interrupted the Data::Dumper version after almost 14 seconds and there was no output. I interrupted the Data::Dump::Streamer version at almost the same time and it had finished its output.)

Concluding Thoughts

I usually use YAML for peering at complex data structures, but DDS works really well as a code-aware serialization module. If you use Data::Dumper often, try Data::Dump::Streamer for a few days instead. Its documentation explains a few other convenient features you might not have realized that you missed.