Working with complex, nested data structures takes practice and patience. It
helps to be able to visualize your data. Data::Dumper is
one of the oldest and most widely used modules because it does what it says –
it serializes a Perl data structure to its equivalent Perl code.
It’s not a perfect module, though. Its default output is a little verbose (if customizable), it can use a lot of memory, and it can be slow. It also doesn’t handle complex references well.
Data::Dump::Streamer
is a newer alternative that works better in some cases. Here’s what I learned
from playing with it one afternoon.
Inside DDS
The API is similar, but not exactly equivalent, to the
Data::Dumper interface. In particular, there’s a separate analysis
and output phase. The Dumper() equivalent seems to be:
print Dump( $some_var )->Out();
You must always call Dump() or some alternative
before Out().
Don’t Lose Lexical Names
For the simple case, the two modules are roughly equivalent. There are
some other nice features, too. DumpLex() is very handy if you
have PadWalker installed:
use Data::Dump::Streamer;
my $some_hash = { foo => 1, bar => 2, baz => 3 };
print DumpLex( $some_hash )->Out();
… produces:
$some_hash = {
bar => 2,
baz => 3,
foo => 1
};
This doesn’t always work though; you have to have access to the pad:
use Data::Dump::Streamer;
{
my %closed_over;
sub foo
{
$closed_over{foo}++;
}
sub bar
{
$closed_over{bar}++;
}
sub get_co
{
return \%closed_over;
}
}
foo();
bar();
print DumpLex( get_co() )->Out();
… prints:
Use of uninitialized value in substitution (s///)...
Use of uninitialized value in substitution (s///)...
$HASH1 = {
bar => 1,
foo => 1
};
That is, if you’re outside of the scope of the variable,
DDS can’t (easily or reliably) get the lexical’s name.
If you want to serialize the code with something slightly better than
the global variable case (do somefile.pl), use the
Declare() method to declare lexicals:
use Data::Dump::Streamer;
my ($x, $y);
($x, $y) = \($y, $x);
print Dump( $x, $y )->Declare( 1 )->Out();
… produces:
my $REF1 = 'R: $REF2';
my $REF2 = \$REF1;
$REF1 = \$REF2;
(This is probably more useful than this example makes it seem.)
Peek Inside Subroutine References
One of the nicest features is that dumping objects containing closures works:
use Data::Dump::Streamer;
my %closed_over;
my %held_subs =
(
foo => sub { $closed_over{foo}++ },
bar => sub { $closed_over{bar}++ },
);
my $object = bless \%held_subs, 'Some::Class';
print DumpLex( $object )->Out();
… produces:
my (%closed_over);
%closed_over = ();
$object = bless( {
bar => sub {
use warnings;
use strict 'refs';
$closed_over{'bar'}++;
},
foo => sub {
use warnings;
use strict 'refs';
$closed_over{'foo'}++;
}
}, 'Some::Class' );
Yes, that does imply that dumping subroutine references works too. (The
extra use lines in the dumped subroutines come from
B::Deparse, not Data::Dump::Streamer.)
Trying to Break Things
Arguably, DDS handles a few pathological cases better than
Data::Dumper:
use Data::Dumper;
use Data::Dump::Streamer;
my ($x, $y);
($x, $y) = \($y, $x);
print Dumper( $x, $y );
print "\n";
print Dump( $x, $y )->Out();
… produces:
$VAR1 = \\$VAR1;
$VAR2 = ${$VAR1};
$REF1 = \$REF2;
$REF2 = \$REF1;
When and Why Streaming Matters
One of my biggest frustrations with Data::Dumper is that it
builds the entire serialized string in memory first before writing it. That
can take a while. I don’t have a good example of this, but here’s a test
program that builds a deep data structure and serializes it.
use Data::Dumper;
use Data::Dump::Streamer;
my $data = {};
my $top = $data;
for ( 1 .. 5000 )
{
$data = $data->{foo} = {};
}
print Dumper( $top );
# print DumpLex( $top )->Out;
I ran this a couple of times with 1000 iterations and a couple of times with 5000 iterations. (I also redirected STDOUT to /dev/null to remove some of the IO timing.) This is not scientific and barely a benchmark, but the results are interesting.
With the 1000-level hash reference, Data::Dumper finished
in under a second, while Data::Dump::Streamer took just over
two seconds. For the larger hash reference, DDS took around 14
seconds, while Data::Dumper took between 48 and 60 seconds. It
also used around twice as much memory, at least according to
top (both virtual and resident).
I don’t deal with data structures that complex very often (and the
overhead of that much IO probably matches the overhead of visiting such
data structures), but the other convenience features of DDS
make it compelling.
A better benchmark might also test the latency of requests — that is, I
expect DDS to start producing output sooner, which can be
important in some contexts — say, web programming. (I ran the test again
without redirecting the output. I interrupted the Data::Dumper
version after almost 14 seconds and there was no output. I interrupted the
Data::Dump::Streamer version at almost the same time and it
had finished its output.)
Concluding Thoughts
I usually use YAML for peering at complex data structures, but
DDS works really well as a code-aware serialization module. If
you use Data::Dumper often, try Data::Dump::Streamer
for a few days instead. Its documentation explains a few other convenient
features you might not have realized that you missed.

Is your module a streamer ? Nudge, nudge, wink, wink...