Introduction

Over the next few weeks, I will be posting short little blog entries about things that I’ve found myself tripping over when I first learned Ruby. They will be minimal little blurps that will hopefully help new users either understand a trap they’ve fallen into, or help them avoid a trap.

I hope the RubyGems folks forgive me for the terrible pun, but these are meant to be little gems of information for those new to Ruby. If you are an experienced rubyist, these posts will probably bore you, so be warned!

The Hash Initialization Problem

One fairly common need in programming is to set a default value for keys to map to in a hash. For example, if you are dealing with a hash of a bunch of numbers, you might just want a key that isn’t there to map to zero.

There are two different ways you can use the hash constructor to do this. One is to just pass the number in as a parameter, like Hash.new(0) and the other is to use a block such as
Hash.new { |h,k| h[k] = 0 }

Now, in this simple example, these two pieces of code do the same thing from a users perspective. Both result in the ability to get results like this:

>> a[:foo]
=> 0
>> a[:foo] += 1
=> 1
>> a[:foo]
=> 1
>> a[:bar]
=> 0

Now, as a lazy coder who didn’t initially have a strong grasp of blocks when I first started using Ruby, I much prefered the parameter form. However, there is a subtle difference between the two that can be very problematic if you aren’t careful.

The thing that is important to know is that the parameter form will return the same exact object for all the default values. Though the example before used a Fixnum, which is an immediate value, if you use something like… say a string, it’s not so simple. Take a look at the two chunks of code below, and note the difference.

irb(main):001:0> a = Hash.new("")
=> {}
irb(main):002:0> a[:foo]
=> ""
irb(main):003:0> a[:foo] << "bar"
=> "bar"
irb(main):004:0> a[:foo]
=> "bar"
irb(main):005:0> a[:train]
=> "bar"
irb(main):006:0> a = Hash.new { |h,k| h[k] = "" }
=> {}
irb(main):007:0> a[:foo]
=> ""
irb(main):008:0> a[:foo] << "bar"
=> "bar"
irb(main):009:0> a[:foo]
=> "bar"
irb(main):010:0> a[:train]
=> ""

See? The first bit of code shares a common string object, where the second bit creates a new string for each new key which is not mapped to a value. Though you might find the first behavior useful at times, it is usually the case that the second is what is desired, and this is the rule of thumb I go by to keep from getting snagged:

If I am using an immediate value for my default, I tend to use the parameter method. Otherwise, I tend to use the block form, especially when dealing with any type of collection or string.

Anyway, I hope this helps people understand what the two different initializers do and prevents some gotchas. Happy Hacking!