O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


Leveraging Ruby's Standard Library: Appendix B - Ruby Best Practices

by Gregory Brown

Most of this book has emphasized that understanding how to use tools effectively is just as important as having a nice collection of tools. However, that’s not to say that knowing where to find the right tool for the job isn’t an invaluable skill to have. In this appendix, we’ll take a look at a small sampling of Ruby’s vast standard library. What you will find is that it is essentially a treasure chest of goodies designed to make your Ruby programs more enjoyable to write.

Ruby Best Practices book cover

This excerpt is from Ruby Best Practices. Written by the developer of the Ruby project Prawn (prawn.majesticseacreature.com), this concise book explains how to design beautiful APIs and domain-specific languages, work with functional programming ideas and techniques that can simplify your code and make you more productive, write code that's readable and expressive, and much more. It's the perfect companion to The Ruby Programming Language.

buy button

Why Do We Need a Standard Library?

Because of RubyGems, we tend to leverage a lot of third-party software. For this reason, we are often more likely to resort to a Google search instead of a search of Ruby’s API documentation when we want to solve a problem that isn’t immediately handled in core Ruby. This isn’t necessarily a bad thing, but it is important not to overlook the benefits that come with using a standard library when it is available. When all else is equal, the gains you’ll get from using standard Ruby are easy to enumerate:

  • Ruby standard libraries are typically distributed with Ruby itself, which means that no extra software needs to be installed to make them work.

  • Standard libraries don’t change rapidly. Their APIs tend to be stable and mature, and will likely outlast your application’s development cycle. This removes the need for frequent compatibility updates that you might experience with third-party software.

  • Except for a few obvious exceptions, Ruby standard libraries are guaranteed to run anywhere Ruby runs, avoiding platform-specific issues.

  • Using standard libraries improves the understandability of your code, as they are available to everyone who uses Ruby. For open source projects, this might make contributions to your project easier, for the same reason.

These reasons are compelling enough to encourage us to check Ruby’s standard library before doing a Google search for third-party libraries. However, it might be more convincing if you have some practical examples of what can be accomplished without resorting to dependency code.

I’ve handpicked 10 of the libraries I use day in and day out. This isn’t necessarily meant to be a “best of” sampling, nor is it meant to point out the top 10 libraries you need to know about. We’ve implicitly and explicitly covered many standard libraries throughout the book, and some of those may be more essential than what you’ll see here. However, I’m fairly certain that after reading this appendix, you’ll find at least a few useful tricks in it, and you’ll also get a clear picture of how diverse Ruby’s standard library is.

Be sure to keep in mind that while we’re looking at 10 examples here, there are more than 100 standard libraries packaged with Ruby, about half of which are considered mature. These vary in complexity from simple tools to solve a single task to full-fledged frameworks. Even though you certainly won’t need to be familiar with every last package and what it does, it’s important to be aware of the fact that what we’re about to discuss is just the tip of the iceberg.

Now, on to the fun. I’ve included this appendix because I think it embodies a big part of the joy of Ruby programming to me. I hope that when reading through the examples, you feel the same way.

Pretty-Printer for Ruby Objects (pp)

As I mentioned before, Ruby standard libraries run the gamut from extreme simplicity to deep complexity. I figured we’d kick things off with something in the former category.

If you’re reading this book, you’ve certainly made use of Kernel#p during debugging. This handy method, which calls #inspect on an object and then prints out its result, is invaluable for basic debugging needs. However, reading its output for even relatively modest objects can be daunting:

friends = [ { first_name: "Emily", last_name: "Laskin" },
            { first_name: "Nick",  last_name: "Mauro" },
            { first_name: "Mark",  last_name: "Maxwell" } ]

me = { first_name: "Gregory", last_name: "Brown", friends: friends }

p me # Outputs:

{:first_name=>"Gregory", :last_name=>"Brown", :friends=>[{:first_name=>"Emily",
:last_name=>"Laskin"}, {:first_name=>"Nick", :last_name=>"Mauro"}, {:first_name=

>"Mark", :last_name=>"Maxwell"}]}

We don’t typically write our code this way, because the structure of our objects actually means something to us. Luckily, the pp standard library understands this and provides much nicer human-readable output. The changes to use pp instead of p are fairly simple:

require "pp"

friends = [ { first_name: "Emily", last_name: "Laskin" },
            { first_name: "Nick",  last_name: "Mauro" },
            { first_name: "Mark",  last_name: "Maxwell" } ]

me = { first_name: "Gregory", last_name: "Brown", friends: friends }

pp me # Outputs:

{:first_name=>"Gregory",
 :last_name=>"Brown",
 :friends=>

  [{:first_name=>"Emily", :last_name=>"Laskin"},
   {:first_name=>"Nick", :last_name=>"Mauro"},
   {:first_name=>"Mark", :last_name=>"Maxwell"}]}

Like when you use p, there is a hook to set up pp overrides for your custom objects. This is called pretty_print. Here’s a simple implementation that shows how you might use it:

require "pp"

class Person
  def initialize(first_name, last_name, friends)
    @first_name, @last_name, @friends = first_name, last_name, friends
  end

  def pretty_print(printer)
    printer.text "Person <#{object_id}>:\n" <<

                 "  Name: #@first_name #@last_name\n  Friends:\n"
    @friends.each do |f|
      printer.text "    #{f[:first_name]} #{f[:last_name]}\n"
    end
  end

end

friends = [ { first_name: "Emily", last_name: "Laskin" },
            { first_name: "Nick",  last_name: "Mauro" },
            { first_name: "Mark",  last_name: "Maxwell" } ]

person = Person.new("Gregory", "Brown", friends)
pp person #=> outputs:

Person <1013900>:
  Name: Gregory Brown
  Friends:
    Emily Laskin
    Nick Mauro
    Mark Maxwell

As you can see here, pretty_print takes an argument, which is an instance of the current pp object. Because pp inherits from PrettyPrint, a class provided by Ruby’s prettyprint standard library, it provides a whole host of formatting helpers for indenting, grouping, and wrapping structured data output. We’ve stuck with the raw text() call here, but it’s worth mentioning that there is a lot more available to you if you need it.

A benefit of indirectly displaying your output through a printer object is that it allows pp to give you an inspect-like method that returns a string. Try person.pretty_print_inspect to see how this works. The string represents exactly what would be printed to the console, just like obj.inspect would. If you wish to use pretty_print_inspect as your default inspect method (and therefore make p and pp work the same), you can do so easily with an alias:

class Person
  # other code omitted

  alias_method :inspect, :pretty_print_inspect
end

Generally speaking, pp does a pretty good job of rendering the debugging output for even relatively complex objects, so you may not need to customize its behavior often. However, if you do have a need for specialized output, you’ll find that the pretty_print hook provides something that is actually quite a bit more powerful than Ruby’s default inspect hook, and that can really come in handy for certain needs.

Working with HTTP and FTP (open-uri)

Like most other modern programming languages, Ruby ships with libraries for working with some of the most common network protocols, including FTP and HTTP. However, the Net::FTP and Net::HTTP libraries are designed primarily for heavy lifting at the low level. They are great for this purpose, but they leave something to be desired for when all that is needed is to grab a remote file or do some basic web scraping. This is where open-uri shines.

The way open-uri works is by patching Kernel#open to accept URIs. This means we can directly open remote files and work with them. For example, here’s how we’d print out Ruby’s license using open-uri:

require "open-uri"

puts open("http://www.ruby-lang.org/en/LICENSE.txt").read #=>

"Ruby is copyrighted free software by Yukihiro Matsumoto <matz@netlab.co.jp>.
You can redistribute it and/or modify it under either the terms of the GPL
(see COPYING.txt file), or the conditions below: ..."

If we encounter an HTTP error, an OpenURI::HTTPError will be raised, including the relevant error code:

>> open("http://majesticseacreature.com/a_totally_missing_document")
OpenURI::HTTPError: 404 Not Found
        from /usr/local/lib/ruby/1.8/open-uri.rb:287:in 'open_http'
        from /usr/local/lib/ruby/1.8/open-uri.rb:626:in 'buffer_open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:164:in 'open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'catch'
        from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:132:in 'open_uri'
        from /usr/local/lib/ruby/1.8/open-uri.rb:528:in 'open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:30:in 'open'
        from (irb):10
        from /usr/local/lib/ruby/1.8/uri/generic.rb:250

>> open("http://prism.library.cornell.edu/control/authBasic/authTest/")
OpenURI::HTTPError: 401 Authorization Required
        from /usr/local/lib/ruby/1.8/open-uri.rb:287:in 'open_http'
        from /usr/local/lib/ruby/1.8/open-uri.rb:626:in 'buffer_open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:164:in 'open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'catch'
        from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'open_loop'
        from /usr/local/lib/ruby/1.8/open-uri.rb:132:in 'open_uri'
        from /usr/local/lib/ruby/1.8/open-uri.rb:528:in 'open'
        from /usr/local/lib/ruby/1.8/open-uri.rb:30:in 'open'
        from (irb):7
        from /usr/local/lib/ruby/1.8/uri/generic.rb:250

The previous example was a small hint about another feature of open-uri, HTTP basic authentication. Notice what happens when we provide a username and password accessing the same URI:

>> open("http://prism.library.cornell.edu/control/authBasic/authTest/",
?>   :http_basic_authentication => ["test", "this"])
=> #<StringIO:0x2d1810>

Success! You can see here that open-uri represents the returned file as a StringIO object, which is why we can call read to get its contents. Of course, we can use most other I/O operations as well, but I won’t get into that here.

As I mentioned before, open-uri also wraps Net::FTP, so you could even do something like download Ruby with it:

open("ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p0.tar.bz2") do |o|
  File.open(File.basename(o.base_uri.path), "w") { |f| f << o.read }
end

Here we see that even though the object returned by open() is a StringIO object, it includes some extra metadata, such as the base_uri of your request. These helpers are provided by the OpenURI::Meta module, and are worth looking over if you need to get more than just the contents of a file back.

Although there are some advanced features to open-uri, it is most useful for the simple cases shown here. Because it returns a StringIO object, this means that any fairly flexible interface can be extended to support remote file downloads. For a practical example, we can take a look at Prawn’s image embedding, which assumes only that an object you pass to it must respond to #read:

Prawn::Document.generate("remote_images.pdf") do
  image open("http://prawn.majesticseacreature.com/media/prawn_logo.png")
end

This feature was accidentally enabled when we allowed the image() method to accept Tempfile objects. Because open-uri smoothly integrates with the rest of Ruby, you might find situations where it can come in handy in a similar way in your own applications.

Working with Dates and Times (date)

Core Ruby has a Time class, but we will encounter a lot of situations where we also need to work with dates, or combinations of dates and times. Ruby’s date standard library gives us Date and DateTime, and extends Time with conversion methods for each of them. This library comes packed with a powerful parser that can handle all sorts of date formats, and a solid date formatting engine to output data based on a template. Here are just a couple of trivial examples to give you a sense of its flexibility:

>> Date.strptime("12/08/1985","%m/%d/%Y").strftime("%Y-%m-%d")
=> "1985-12-08"

>> Date.strptime("1985-12-08","%Y-%m-%d").strftime("%m/%d/%Y")
=> "12/08/1985"
>> Date.strptime("December 8, 1985","%b%e, %Y").strftime("%m/%d/%Y")
=> "12/08/1985"

Date objects can also be queried for all sorts of information, as well as manipulated to produce new days:

>> date = Date.today
=> #<Date: 2009-02-09 (4909743/2,0,2299161)>

>> date + 1
=> #<Date: 2009-02-10 (4909745/2,0,2299161)>
>> date << 1
=> #<Date: 2009-01-09 (4909681/2,0,2299161)>
>> date >> 1
=> #<Date: 2009-03-09 (4909799/2,0,2299161)>

>> date.year
=> 2009
>> date.month
=> 2
>> date.day
=> 9
>> date.wday
=> 1
>> date + 36
=> #<Date: 2009-03-17 (4909815/2,0,2299161)>

Here we’ve just scratched the surface, but in the interest of keeping a quick pace, we’ll dive right into an example. So far, we’ve been looking at Date, but now we’re going to work with DateTime. The two are basically the same, except that the latter can hold time values as well:[19]

>> dtime = DateTime.now
=> #<DateTime: 2009-02-09T03:56:17-05:00 (...)>

>> [:month, :day, :year, :hour, :minute, :second].map { |attr| dtime.send(attr) }
=> [2, 9, 2009, 3, 56, 17]
>> dtime.between?(DateTime.now - 1, DateTime.now + 1)
=> true
>> dtime.between?(DateTime.now - 1, DateTime.now - 1)
=> false

What follows is a simplified look at a common problem: event scheduling. Basically, we want an object that provides functionality like this:

sched = Scheduler.new
sched.event "2009.02.04 10:00", "2009.02.04 11:30", "Eat Snow"
sched.event "2009.02.03 14:00", "2009.02.04 14:00", "Wear Special Suit"
sched.display_events_at '2009.02.04 10:20'

When given a specific date and time, display_events_at would look up all the events that were happening at that exact moment:

Events occurring around 10:20 on 02/04/2009
--------------------------------------------
14:00 (02/03) - 14:00 (02/04): Wear Special Suit
10:00 (02/04) - 11:30 (02/04): Eat Snow

This means that if we look a little later in the day, as you can see that in this particular example, although eating snow is a short-lived experience, the passion for wearing a special suit carries on:

sched.display_events_at '2009.02.04 11:45'

## OUTPUTS ##

Events occurring around 11:45 on 02/04/2009
--------------------------------------------
14:00 (02/03) - 14:00 (02/04): Wear Special Suit

As it turns out, implementing the Scheduler class is pretty straightforward, because DateTime objects can be used as endpoints in a Ruby range object. So when we look at these two events, what we’re really doing is something similar to this:

>> a = DateTime.parse("2009.02.03 14:00") .. DateTime.parse("2009.02.04 14:00")

>> a.cover?(DateTime.parse("2009.02.04 11:45"))
=> true

>> b = DateTime.parse("2009.02.04 10:00") .. DateTime.parse("2009.02.04 11:30")
>> b.cover?(DateTime.parse("2009.02.04 11:45"))
=> false

When you combine this with things we’ve already discussed, like flexible date parsing and formatting, you end up with a fairly vanilla implementation:

require "date"

class Scheduler

  def initialize
    @events = []
  end

  def event(from, to, message)
    @events << [DateTime.parse(from) .. DateTime.parse(to),  message]
  end

  def display_events_at(datetime)
    datetime = DateTime.parse(datetime)
    puts "Events occurring around #{datetime.strftime("%H:%M on %m/%d/%Y")}"
    puts "--------------------------------------------"
    events_at(datetime).each do |range, message|
      puts "#{time_abbrev(range.first)} - #{time_abbrev(range.last)}: #{message}"
    end
  end

  private

  def time_abbrev(datetime)
    datetime.strftime("%H:%M (%m/%d)")
  end

  def events_at(datetime)
    @events.each_with_object([]) do |event, matched|
      matched << event if event.first.cover?(datetime)
    end
  end

end

We can start by looking at how event() works:

def event(from, to, message)
  @events << [DateTime.parse(from) .. DateTime.parse(to),  message]
end

Here we see that each event is simply a tuple consisting of two elements: a datetime Range, and a message. We parse the strings on the fly using DateTime.parse. This method should typically be used with caution, as it is much more reliable to use Date.strptime, and much faster to construct a DateTime manually than it is to attempt to guess the date format. That having been said, there is no substitute when you cannot rely on a standardized date format, and it does a good job of providing a flexible interface when one is needed.

As this fairly pedestrian code completely covers storing events, what remains to be shown is how they are selectively retrieved and displayed. We’ll start with the helper method that looks up what events are going on at a particular time:

def events_at(datetime)
  @events.each_with_object([]) do |event, matched|
    matched << event if event.first.cover?(datetime)
  end
end

Here, we build up our list of matching events by simply iterating over the event list and including those only those events in which the datetime Range covers the time in question. This, along with the self-explanatory time_abbrev code, is used to keep display_events_at nice and clean:

def display_events_at(datetime)
  datetime = DateTime.parse(datetime)
  puts "Events occurring around #{datetime.strftime("%H:%M on %m/%d/%Y")}"
  puts "--------------------------------------------"
  events_at(datetime).each do |range, message|
    puts "#{time_abbrev(range.first)} - #{time_abbrev(range.last)}: #{message}"
  end
end

Here, we’re doing little more than parsing the date and time passed in as a string to get us a DateTime object, and then displaying the results of events_at. We take advantage of strftime for that, and recover the endpoints of the range to include in our output, to show exactly when an event starts and stops. There’s really not much more to it.

Although this example is obviously a bit oversimplified, you’ll find that similar problems crop up again and again. The key thing to remember is to take advantage of the ability of DateTime objects to be used within ranges, and whenever possible, to avoid parsing dates yourself. If you need finer granularity, use strptime(), but for many needs parse() will do the trick while providing a more flexible interface to your users.

We’ve covered some of the most common uses of Ruby’s standard date library here, but there are of course plenty of other features for the edge case. As with the other topics in this appendix, hit up the API documentation if you need to know more.

Lexical Parsing with Regular Expressions (strscan)

Although Ruby’s String object provides many powerful features that rely on regular expressions, it can be cumbersome to build any sort of parser with them. Most operations that you can do directly on strings work on the whole string at once, providing MatchData that can be used to index into the original content. This is great when a single pattern fits the bill, but when you want to consume some text in chunks, switching up strategies as needed along the way, things get a little more hairy. This is where the strscan library comes in.

When you require strscan, it provides a class called StringScanner. The underlying purpose of using this object is that it keeps track of where you are in the string as you consume parts of it via regex patterns. Just to clear up what this means, we can take a look at the example used in the RDoc:

s = StringScanner.new('This is an example string')
s.eos?               # -> false

p s.scan(/\w+/)      # -> "This"
p s.scan(/\w+/)      # -> nil
p s.scan(/\s+/)      # -> " "
p s.scan(/\s+/)      # -> nil
p s.scan(/\w+/)      # -> "is"
s.eos?               # -> false

p s.scan(/\s+/)      # -> " "
p s.scan(/\w+/)      # -> "an"
p s.scan(/\s+/)      # -> " "
p s.scan(/\w+/)      # -> "example"
p s.scan(/\s+/)      # -> " "
p s.scan(/\w+/)      # -> "string"
s.eos?               # -> true

p s.scan(/\s+/)      # -> nil
p s.scan(/\w+/)      # -> nil

From this simple example, it’s clear to see that the index is advanced only when a match is made. Once the end of the string is reached, there is nothing left to match. Although this may seem a little simplistic at first, it forms the essence of what StringScanner does for us. We can see that by looking at how it is used in the context of something a little more real.

We’re about to look at how to parse JSON (JavaScript Object Notation), but the example we’ll use is primarily for educational purposes, as it demonstrates an elegant use of StringScanner. If you have a real need for this functionality, be sure to look at the json standard library that ships with Ruby, as that is designed to provide the kind of speed and robustness you’ll need in production.

In Ruby Quiz #155, James Gray builds up a JSON parser by hand-rolling a recursive descent parser using StringScanner. He actually covers the full solution in depth on the Ruby Quiz website, but this abridged version focuses specifically on his use of StringScanner. To keep things simple, we’ll discuss roughly how he manages to get this small set of assertions to pass:

def test_array_parsing
  assert_equal(Array.new, @parser.parse(%Q{[]}))
  assert_equal( ["JSON", 3.1415, true],
    @parser.parse(%Q{["JSON", 3.1415, true]}) )
  assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]}))
end

We can see by the general outline how this parser works:

require "strscan"

class JSONParser
  AST = Struct.new(:value)

  def parse(input)
    @input = StringScanner.new(input)
    parse_value.value
  ensure
    @input.eos? or error("Unexpected data")
  end

  private

  def parse_value
    trim_space
    parse_object or
    parse_array or
    parse_string or
    parse_number or
    parse_keyword or
    error("Illegal JSON value")
  ensure
    trim_space
  end


  # ...

end

Essentially, a StringScanner object is built up using the original JSON string. Then, the parser recursively walks down through the structure and parses the data types it encounters. Once the parsing completes, we expect that we’ll be at the end of the string, otherwise some data was left unparsed, indicating corruption.

Looking at the way parse_value is implemented, we see the benefit of using StringScanner. Before an actual value is parsed, whitespace is trimmed on both ends using the trim_space helper. This is exactly as simple as you might expect it to be:

def trim_space
  @input.scan(/\s+/)
end

Of course, to make things a little more interesting, and to continue our job, we need to peel back the covers on parse_array:

def parse_array
  if @input.scan(/\[\s*/) #1
    array       = Array.new
    more_values = false
    while contents = parse_value rescue nil #2
      array << contents.value
      more_values = @input.scan(/\s*,\s*/) or break
    end
    error("Missing value") if more_values
    @input.scan(/\s*\]\s*/) or error("Unclosed array") #3
    AST.new(array)
  else
    false
  end
end

The beauty of JSON (and this particular parsing solution) is that it’s very easy to see what’s going on. On a successful parse, this code takes three simple steps. First, it detects the opening [, indicating the start of a JSON array. If it finds that, it creates a Ruby array to populate. Then, the second step is to parse out each value, separated by commas and optional whitespace. To do this, the parser simply calls parse_value again, taking advantage of recursion as we mentioned before. Finally, the third step is to seek a closing ], which, when found, ends this stage of parsing and returns a Ruby array wrapped in the AST struct this parser uses.

Going back to our three assertions, we can trace them one by one. The first one was meant to test parsing an empty array:

assert_equal(Array.new, @parser.parse(%Q{[]}))

This one is the most simple to trace, predictably. When parse_value is called to capture the contents in the array, it will error out, because no JSON objects start with ]. James is using a clever trick that banks on a failed parse, because that allows him to short-circuit processing the contents. This error is swallowed, leaving the contents empty. The string is then scanned for the closing ], which is found, and an AST-wrapped empty Ruby array is returned.

The second assertion is considerably more tricky:

assert_equal( ["JSON", 3.1415, true],
  @parser.parse(%Q{["JSON", 3.1415, true]}) )

Here, we need to rely on parse_value’s ability to parse strings, numbers, and booleans. All three of these are done using techniques similar to those shown so far, but a string is a little hairy due to some tricky edge cases. However, to give you a few extra samples, we can take a look at the other two:

def parse_number
  @input.scan(/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?\b/) and
    AST.new(eval(@input.matched))
end

def parse_keyword
  @input.scan(/\b(?:true|false|null)\b/) and
    AST.new(eval(@input.matched.sub("null", "nil")))
end

In both cases, James takes advantage of the similarities between Ruby and JSON when it comes to numbers and keywords, and essentially just evals the results after a bit of massaging. The numeric pattern is a little hairy, and you don’t necessarily need to understand it. Instead, the interesting thing to note about these two examples is their use of StringScanner#matched. As the name suggests, this method returns the actual string that was just matched by the pattern. This is a common way to extract values while conditionally scanning for matches.

This pretty much wraps up the interesting bits about getting the second assertion to pass. Here, the parser just keeps attempting to pull off new values if it can, while the array code wipes out any intermediate commas. Once the values are exhausted, the ] is then searched for, as before.

The third and final case for array parsing may initially seem complicated:

assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]}))

However, if you recall that the way parse_array works is to repeatedly call parse_value until all its elements are consumed, it’s clear what is going on. Because an array can be parsed by parse_value just the same as any other job, the nested arrays have no trouble repeating the same process to find their elements, which can also be arrays. At some point, this process bottoms out, and the whole structure is built up. That means that we actually get to pass this third assertion for free, as the implementation already uses recursive calls through parse_value.

Although this doesn’t cover 100% of how James’s parser works, it gives you a good sense of when StringScanner might be a good tool to have around. You can see how powerful it is to keep a single reference to a StringScanner and use it in a number of different methods to consume a string part by part. This allows better decomposition of your program, and simplifies the code by removing some of the low-level plumbing from the equation. So next time you want to do regular expression processing on a string chunk by chunk rather than all at once, you might want to give StringScanner a try.

Cryptographic Hash Functions (digest)

Though it might not be something we do every day, having easy access to the common cryptographic hash functions can be handy for all sorts of things. The digest standard library provides several options, including MD5, SHA1, and SHA2. We’ll cover three simple use cases here: calculating the checksum of a file, uniquely hashing files based on their content, and encrypted password storage.

I won’t get into the details about the differences between various hashing algorithms or their limitations. Though they all have a potential risk for what is known as a collision, where two distinct content keys are hashed to the same value, this is rare enough to not need to worry about in most practical scenarios. Of course, if you’re new to encryption in general, you will want to read up on these techniques elsewhere before attempting to use them for anything nontrivial. Assuming that you accept this responsibility, we can move on to see how these hashing functions can be used in your Ruby applications.

We’ll start with checksums, because these are pretty easy to find in the wild. If you’ve downloaded open source software before, you’ve probably seen MD5 or SHA256 hashes before. I’ll be honest: most of the time I just ignore these, but they do come in handy when you want to verify that an automated download completed correctly. They’re also useful if you have a tendency toward paranoia and want to be sure that the file you are receiving is really what you think it is. Using the Ruby 1.9.1 release notes themselves as an example, we can see what a digitally signed file download looks like:

== Location
* ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p0.tar.bz2
 SIZE:   7190271 bytes
 MD5:    0278610ec3f895ece688de703d99143e
 SHA256: de7d33aeabdba123404c21230142299ac1de88c944c9f3215b816e824dd33321

Once we’ve downloaded the file, pulling the relevant hashes is trivial. Here’s how we’d grab the MD5 hash:

>> require "digest/md5"
=> true

>> Digest::MD5.hexdigest(File.binread("ruby-1.9.1-p0.tar.bz2"))
=> "0278610ec3f895ece688de703d99143e"

If we preferred the slightly more secure SHA256 hash, we don’t need to work any harder:

>> require "digest/sha2"
=> true
>> Digest::SHA256.hexdigest(File.binread("ruby-1.9.1-p0.tar.bz2"))
=> "de7d33aeabdba123404c21230142299ac1de88c944c9f3215b816e824dd33321"

As both of these match the release notes, we can be reasonably sure that nothing nefarious is going on, and also that our file integrity has been preserved. That’s the most common use of this form of hashing.

Of course, in addition to identifying a particular file uniquely, cryptographic hashes allow us to identify the uniqueness of a file’s content. If you’ve used the revision control system git, you may have noticed that the revisions are actually identified by SHA1 hashes that describe the changesets. We can do similar things in our Ruby applications.

For example, in Prawn, we support embedding images in PDF documents. Because these images can be from any number of sources ranging from a temp file to a directly downloaded image from the Web, we cannot rely on unique filenames mapping to unique images. Processing images can be pretty costly, especially when we do things like split out alpha channels for PNGs, so we want to avoid reprocessing images when we can avoid it. The solution to this problem is simple: we use SHA1 to generate a hexdigest for the image content and then use that as a key into a hash. A rough approximation of what we’re doing looks like this:

require "digest/sha1"

def image_registry(raw_image_data)
  img_sha1 = Digest::SHA1.hexdigest(raw_image_data)
  @image_registry[img_sha1] ||= build_image_obj(raw_image_data)
end

This technique clearly isn’t limited to PDF generation. To name just a couple of other use cases, I use a similar hashing technique to make sure the content of my blog has changed before reuploading the static files it generates, so it uploads only the files it needs. I’ve also seen this used in the context of web applications to prevent identical content from being copied again and again to new files. Fundamentally, these ideas are nearly identical to the previous code sample, so I won’t illustrate them explicitly.

However, while we’re on the topic of web applications, we can work our way into our last example: secure password storage.

It should be pretty obvious that even if we restrict access to our databases, we should not store passwords in clear text. We have a responsibility to offer users a reasonable amount of privacy, and through cryptographic hashing, even administrators can be kept in the dark about what individual users’ passwords actually are. Using the techniques already shown, we get most of the way to a solution.

The following example is from an ActiveRecord model as part of a Rails application, but it is fairly easily adaptable to any system in which the user information is remotely stored outside of the application itself. Regardless of whether you are familiar with ActiveRecord, the code should be fairly straightforward to follow with a little explanation. Everything except the relevant authentication code has been omitted, to keep things well focused:

class User < ActiveRecord::Base

  def password=(pass)
    @password = pass
    salt = [Array.new(6){rand(256).chr}.join].pack("m").chomp
    self.password_salt, self.password_hash =
      salt, Digest::SHA256.hexdigest(pass + salt)
  end

  def self.authenticate(username,password)
    user = find_by_username(username)
    hash = Digest::SHA256.hexdigest(password + user.password_salt)
    if user.blank? || hash != user.password_hash
      raise AuthenticationError, "Username or password invalid"
    end

    user
  end

end

Here we see two functions: one for setting an individual user’s password, and another for authenticating and looking up a user by username and password. We’ll start with setting the password, as this is the most crucial part:

def password=(pass)
  @password = pass
  salt = [Array.new(6){rand(256).chr}.join].pack("m").chomp
  self.password_salt, self.password_hash =
    salt, Digest::SHA256.hexdigest(pass + salt)
end

Here we see that the password is hashed using Digest::SHA256, in a similar fashion to our earlier examples. However, this password isn’t directly hashed, but instead, is combined with a salt to make it more difficult to guess. This technique has been shown in many Ruby cookbooks and tutorials, so you may have encountered it before. Essentially, what you are seeing here is that for each user in our database, we generate a random six-byte sequence and then pack it into a base64-encoded string, which gets appended to the password before it is hashed. This makes several common attacks much harder to execute, at a minimal complexity cost to us.

An important thing to notice is that what we store is the fingerprint of the password after it has been salted rather than the password itself, which means that we never store the original content and it cannot be recovered. So although we can tell whether a given password matches this fingerprint, the original password cannot be retrieved from the data we are storing.

If this more or less makes sense to you, the authenticate method will be easy to follow now:

def self.authenticate(username,password)
  user = find_by_username(username)
  hash = Digest::SHA256.hexdigest(password + user.password_salt)
  if user.blank? || hash != user.password_hash
    raise AuthenticationError, "Username or password invalid"
  end

  user
end

Here, we first retrieve the user from the database. Assuming that the username is valid, we then look up the salt and add it to our bare password string. Because we never stored the actual password, but only its salted hash, we call hexdigest again and compare the hash to the one stored in the database. If they match, we return our user object and all is well; if they don’t, an error is raised. This completes the cycle of secure password storage and authentication and demonstrates the role that cryptographic hashes play in it.

With that, we’ve probably talked enough about digest for now. There are some more advanced features available, but as long as you know that Digest::MD5, Digest::SHA1 and Digest::SHA256 exist and how to call hexdigest() on each of them, you have all you’ll need to know for most occasions. Hopefully, the examples here have illustrated some of the common use cases, and helped you think of your own in the process.

Mathematical Ruby Scripts (mathn)

The mathn standard library, when combined with the core Math module, serves to make mathematical operations more pleasant in Ruby. The main purpose of mathn is to pull in other standard libraries and integrate them with the rest of Ruby’s numeric system. You’ll notice this right away when doing basic arithmetic:

>> 1 / 2
=> 0

>> Math.sqrt(-1)
Errno::EDOM: Numerical argument out of domain - sqrt ..

>> require "mathn"
=> true
>> 1 / 2
=> 1/2
>> 1 / 2  + 5 / 7
=> 17/14
>> Math.sqrt(-1)
=> (0+1i)

As you can see, integer division gives way when mathn is loaded, in favor of returning Rational objects. These behave like the fractions you learned in grade school, and keep values in exact terms rather than expressing them as floats where possible. Numbers also gracefully extend into the Complex field, without error. Although this sort of behavior might seem unnecessary for day-to-day programming needs, it can be very helpful for mathematical applications.

In addition to changing the way basic arithmetic works, mathn pulls in a few of the higher-level mathematical constructs. For those interested in enumerating prime numbers (for whatever fun reason you might have in mind), a class is provided. To give you a peek at how it works, we can do things like ask for the first 10 primes or how many primes exist up to certain numbers:

>> Prime.first(10)
=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]


>> Prime.find_index { |e| e > 1000 }
=> 168

In addition to enabling Prime, mathn also allows you to use Matrix and Vector without an explicit require:

>> Vector[1,2,3] * 3
=> Vector[3, 6, 9]

>> Matrix[[5,1,2],[3,1,4]].transpose
=> Matrix[[5, 3], [1, 1], [2, 4]]

These classes can do all sorts of useful linear algebra functions, but I don’t want to overwhelm the casual Rubyist with mathematical details. Instead, we’ll look at a practical use of them and leave the theory as a homework assignment. Consider the simple drawing in Figure B.1, “A pair of triangles with a mathematical relationship”.

Figure B.1. A pair of triangles with a mathematical relationship

A pair of triangles with a mathematical relationship

We see two rather exciting triangles, nuzzling up against each other at a single point. As it turns out, the smaller triangle is nothing more than a clone of the larger one, reflected, rotated, and scaled to fit. Here’s the code that makes all that happen:[20]

include Math

Canvas.draw("triangles.pdf") do
  points = Matrix[[0,0], [100,200], [200,100]]

  paint_triangle(*points)

  # reflect across y-axis
  points *= Matrix[[-1, 0],[0,1]]

  # rotate so bottom is flush with x axis.
  theta = -atan(1/2)
  points *= Matrix[[cos(theta), -sin(theta)],
                    [sin(theta),  cos(theta)]]

  # scale down by 50%
  points  *= 1/2
  paint_triangle(*points)
end

You don’t need to worry about the graphics-drawing code. Instead, focus on the use of Matrix manipulations here, and watch what happens to the points in each step. We start off with our initial triangle’s coordinates, as such:

>> points = Matrix[[0,0], [100,200], [200,100]]
=> Matrix[[0, 0], [100, 200], [200, 100]]

Then, we multiply by a 2 × 2 matrix that performs a reflection across the y-axis:

>> points *= Matrix[[-1, 0],[0,1]]
=> Matrix[[0, 0], [-100, 200], [-200, 100]]

Notice here that the x values are inverted, while the y value is left untouched. This is what translates our points from the right side to the left. Our next task uses a bit of trigonometry to rotate our triangle to lie flat along the x-axis. Notice here that we use the arctan of 1/2, because the bottom-edge triangle on the right rises halfway toward the upper boundary before terminating. If you aren’t familiar with how this calculation works, don’t worry—just observe its results:

>> theta = -atan(1/2)
=> -0.463647609000806

>> points *= Matrix[[cos(theta), -sin(theta)],
?>                  [sin(theta), cos(theta)] ]
=> Matrix[[0.0, 0.0], [-178.885438199983, 134.164078649987], [-223.606797749979, 0.0]]

The numbers got a bit ugly after this calculation, but there is a key observation to make here. The triangle’s dimensions were preserved, but two of the points now lie on the x-axis. This means our rotation was successful.

Finally, we do scalar multiplication to drop the whole triangle down to half its original size:

>> points *= 1/2
=> Matrix[[0.0, 0.0], [-89.4427190999915, 67.0820393249935], [-111.803398874989, 0.0]]

This completes the transformation and shows how the little triangle was developed simply by manipulating the larger one. Although this is certainly a bit of an abstract example, it hopefully serves as sufficient motivation for learning a bit more about matrixes. Although they can certainly be used for more hardcore calculations, simple linear transformations such as the ones shown in this example come cheap and easy and demonstrate an effective way to do some interesting graphics work.

Although truly hardcore math might be better suited for a more special-purpose language, Ruby is surprisingly full-featured enough to write interesting math programs with. As this particular topic can run far deeper than I have time to discuss, I will leave further investigation to the interested reader. The key thing to remember is that mathn puts Ruby in a sort of “math mode” by including some of the most helpful standard libraries and modifying the way that Ruby does its basic arithmetic. This feature is so useful that irb includes a special switch -m, which essentially requires mathn and then includes the Math module in at the top level.

A small caveat to keep in mind when working with mathn is that it is fairly aggressive about the changes it makes. If you are building a Ruby library, you may want to be a bit more conservative and use the individual packages it enables one by one rather than having to deal with the consequences of potentially breaking code that relies on behaviors such as integer division.

All that having been said, if you’re working on your math homework, or building a specialized mathematical application in Ruby, feel free to go wild with all that mathn has to offer.

Working with Tabular Data (csv)

If you need to represent a data table in plain-text format, CSV (comma-separated value) files are about as simple as you can get. These files can easily be processed by almost any programming language, and Ruby is no exception. The csv standard library is fast for pure Ruby, internationalized, and downright pleasant to work with.

In the most simple cases, it’d be hard to make things easier. For example, say you had a CSV file (payments.csv) that looked like this:

name,payment
Gregory Brown,100
Joe Comfort,150
Jon Juraschka,200
Gregory Brown,75
Jon Juraschka,250
Jia Wu,25
Gregory Brown,50
Jia Wu,75

If you want to just slurp this into an array of arrays, it can’t be easier:

>> require "csv"
=> true

>> CSV.read("payments.csv")
=> [["name", "payment"], ["Gregory Brown", "100"], ["Joe Comfort", "150"],
    ["Jon Juraschka", "200"], ["Gregory Brown", "75"], ["Jon Juraschka", "250"],
    ["Jia Wu", "25"], ["Gregory Brown", "50"], ["Jia Wu", "75"]]

Of course, slurping files isn’t a good idea if you want to handle only a subset of data, but csv makes row-by-row handling easy. Here’s an example of how you’d capture only my records:

>> data = []
=> []
>> CSV.foreach("payments.csv") { |row| data << row if row[0] == "Gregory Brown" }
=> nil

>> data
=> [["Gregory Brown", "100"], ["Gregory Brown", "75"], ["Gregory Brown", "50"]]

A common convention is to have the first row of a CSV file represent header data. csv can give you nicer accessors in this case:

>> data
=> []
>> CSV.foreach("payments.csv", :headers => true) do |row|
?>   data << row if row['name'] == "Gregory Brown"

>> end
=> nil
>> data
=> [#<CSV::Row "name":"Gregory Brown" "payment":"100">,
    #<CSV::Row "name":"Gregory Brown" "payment":"75">,
    #<CSV::Row "name":"Gregory Brown" "payment":"50">]

CSV::Row is a sort of hash/array hybrid. The primary feature that distinguishes it from a hash is that it allows for duplicate field names. Here’s an example of how that works. Given a simple file with nonunique column names like this (phone_numbers.csv):

applicant,phone_number,spouse,phone_number
James Gray,555 555 5555,Dana Gray,123 456 7890
Gregory Brown,098 765 4321,Jia Wu,222 222 2222

we can extract both "phone_number" fields, as shown here:

>> data = CSV.read("phone_numbers.csv", :headers => true)
=> #<CSV::Table mode:col_or_row row_count:3>

>> data.map { |r| r["phone_number"], r["phone_number",2] }
=> [["555 555 5555", "123 456 7890"], [" 098 765 4321", "222 222 2222"]]

We see that CSV::Row#[] takes an optional second argument that is an offset from which to begin looking for a field name. For this particular data, r["phone_number",0] and r["phone_number",1] would resolve as the first phone number field; an index of 2 or 3 would look up the second phone number. If we know the names of the columns near each phone number, we can do this in a bit of a smarter way:

>> data.map { |r| [ r["phone_number", r.index("applicant")],
?>                  r["phone_number", r.index("spouse")] ] }
=> [[" 555 555 5555", "123 456 7890"], ["098 765 4321", "222 222 2222"]]

Although this still depends on ordinal positioning to some extent, it allows us to do a relative index lookup. If we know that “phone number” is always going to be next to “applicant” and “spouse,” it doesn’t matter which column they start at. Whenever you can take advantage of this sort of flexibility, it’s a good idea to do so.

So far, we’ve talked about reading files, but the csv library handles writing as well. Rather than continue with our irb-based exploration, I’ll just combine the features that we’ve already gone over with a couple new ones, so that we can look at a tiny but fully functional script.

Our task is to convert the previously mentioned payments.csv file into a summary report (payment_summary.csv), which will look like this:

name,total payments
Gregory Brown,225
Joe Comfort,150
Jon Juraschka,450
Jia Wu,100

Here, we’ve done a grouping on name and summed up the payments. If you thought this might be a complicated process, you thought wrong. Here’s all that needs to be done:

require "csv"

@totals = Hash.new(0)

csv_options = {:headers => true, :converters => :numeric }

CSV.foreach("payments.csv", csv_options) do |row|
  @totals[row['name']] += row['payment']
end

CSV.open("payment_summary.csv", "w") do |csv|
  csv << ["name","total payments"]
  @totals.each { |row| csv << row }
end

The core mechanism for doing the grouping-and-summing operation is just a hash with default values of zero for unassigned keys. If you haven’t seen this technique before, be sure to make a note of it, because you’ll see it all over Ruby scripts. The rest of the work is easy once we exploit this little trick.

As far as processing the initial CSV file goes, the only new trick we’ve added is to specify the option :converters => :numeric. This tells csv to hit each cell with a check to see whether it contains a valid Ruby number. It then does the right thing and converts it to a Fixnum or Float if there’s a match. This lets us normalize our data as soon as it’s loaded, rather than litter our code with to_i and to_f calls.

For this reason, the foreach loop is nothing more than simple addition, keying the name to a running total of payments.

Finally, we get to the writing side of things. We use CSV.open in a similar way to how we might use File.open, and we populate a CSV object by shoving arrays that represent rows into it. This code is a little prettier than it might be in the general case, as we’re working with only two columns, but you should be able to see that the process is relatively straightforward nonetheless.

Here we see a useful little script based on the csv library weighing in at around 10 lines of code. As impressive as that might be, we haven’t even scratched the surface on this one, so be sure to dig deeper if you have a need for processing tabular datafiles. One thing that I didn’t show at all is that csv handles reading and writing from strings just as well as it does files, which may be useful in web applications and other places where there is a need to stream files rather than work directly with the filesystem. Another is dealing with different column and row record separators. Luckily, the csv library is comparably well documented, so all of these things are just a quick API documentation search away.

A great thing about the newly revamped csv standard library is that you don’t necessarily need to upgrade to Ruby 1.9 to use it. It started as a third-party alternative to Ruby 1.8’s CSV standard library, under the name FasterCSV, and this project is still supported under Ruby 1.8.6. So if you like what you see here, and you want to use it in some of your legacy Ruby code, you can always install the fastercsv gem and be up and running.

Transactional Filesystem-Based Data Storage (pstore)

PStore provides a simple, transactional database for storing Ruby objects within a file. This gives you a persistence layer without relying on any external resources, which can be very handy. Using PStore is so simple that I can forgo most of the details and jump right into some code that I use for a Sinatra microapp at work. What follows is a very simple backend for storing replies to an anonymous survey:

class SuggestionBox
  def initialize(filename="suggestions.pstore")
    @filename = filename
  end

  def store
    @store ||= PStore.new(@filename)
  end

  def add_reply(reply)
    store.transaction do
      store[:replies] ||= []
      store[:replies] << reply
    end
  end

  def replies(readonly=true)
    store.transaction do
      store[:replies]
    end
  end

  def clear_replies
    store.transaction do
      store[:replies] = []
    end
  end
end

In our application, the usage of this object for storing responses is quite simple. Given that box is just a SuggestionBox in the following code, we just have a single call to add_reply that looks something like this:

box.add_reply(:question1 => params["question1"],
              :question2 => params["question2"])

Later, when we need to generate a PDF report of all the replies in the suggestion box, we do something like this to extract the responses to the two questions:

question1_replies = []
question2_replies = []
box.replies.each do |reply|
  question1_replies << reply[:question1]
  question2.replies << reply[:question2]
end

So that covers the usage, but let’s go back in a little more detail to the implementation. You can see that our store is initialized by just constructing a new PStore object and passing it a filename:

def store
  @store ||= PStore.new(@filename)
end

Then, when it comes to using our PStore, it basically looks like we’re dealing with a hash-like object, but all of our interactions with it are through these transaction blocks:

def add_reply(reply)
  store.transaction do
    store[:replies] ||= []
    store[:replies] << reply
  end
end

def replies
  store.transaction(readonly=true) do
    store[:replies]
  end
end

def clear_replies
  store.transaction do
    store[:replies] = []
  end
end

So the real question here is what do we gain? The answer is, predictably, a whole lot.[21]

By using PStore, we can be sure that only one write-mode transaction is open at a time, preventing issues with partial reads/writes in multiprocessed applications. This means that if we attempt to produce a report against SuggestionBox while a new suggestion is being written, our report will wait until the write operation completes before it processes. However, when all transactions are read-only, they will not block each other, allowing them to run concurrently.

Every transaction reloads the file at its start, keeping things up-to-date and synchronized. Every write that is done checks the MD5 sum of the contents to avoid unnecessary writes for unchanging data. If something goes wrong during a write, all the write operations in a transaction are rolled back and an exception is raised. In short, PStore provides a fairly robust persistence framework that is suitable for use across multiple applications or threads.

Of course, though it is great for what it does, PStore has notable limitations. Because it loads the entire dataset on every read, and writes the whole dataset on every write, it is very I/O-intensive. Therefore, it’s not meant to handle very high load or large datasets. Additionally, as it is essentially nothing more than a file-based Hash object, it cannot serve as a substitute for some sort of SQL server when dealing with relational data that needs to be efficiently queried. Finally, because it uses the core utility Marshal to serialize objects to disk,[22] PStore cannot be used to store certain objects. These include anonymous classes and Proc objects, among other things.

Despite these limitations, PStore has a very wide sweet spot in which it is the right way to go. Whenever you need to persist a modest amount of nonrelational data and possibly share it across processes or threads, it is usually the proper tool for the job. Although it is possible to code up your own persistence solutions on top of Ruby’s raw serialization support, PStore solves most of the common needs in a rather elegant way.

Human-Readable Data Serialization (json)

JavaScript Object Notation (JSON) is an object serialization format that has been gaining a ton of steam lately. With the rise of a service-oriented Web, the need for a simple, language-independent data serialization format has become more and more apparent.

Historically, XML has been used for interoperable data serialization. However, using XML for this is a bit like going bird hunting with a bazooka: it’s just way more firepower than the job requires. JSON aims to do one thing and do it well, and these constraints give rise to a human-readable, human-editable, easy-to-parse, and easy-to-produce data interchange format.

The primitive constructs provided by JSON are limited to hashes (called objects in JSON), arrays, strings, numbers, and the conditional triumvirate of true, false, and nil (called null in JSON). It’s easy to see that these concepts trivially map to core Ruby objects. Let’s take a moment to see how each of these data types are represented in JSON:

require "json"

hash = { "Foo" => [Math::PI, 1, "kittens"],
         "Bar" => [false, nil, true], "Baz" => { "X" => "Y" } } #...

puts hash.to_json #=> Outputs
{"Foo":[3.14159265358979,1,"kittens"],"Bar":[false,null,true],"Baz":{"X":"Y"}}

There isn’t really much to it. In fact, JSON is somewhat syntactically similar to Ruby. Though the similarity is only superficial, it is nice to be able to read and write structures in a format that doesn’t feel completely alien.

If we go the other direction, from JSON into Ruby, you’ll see that the transformation is just as easy:

require "json"

json_string = '{"Foo":[3.14159265358979,1,"kittens"], ' +
              '"Bar":[false,null,true],"Baz":{"X":"Y"}}'

hash = JSON.parse(json_string)

p hash["Bar"] #=> Outputs
[false,nil,true]

p hash["Baz"] #=> Outputs
{ "X"=>"Y" }

Without knowing much more about Ruby’s json standard library, you can move on to building useful things. As long as you know how to navigate the nested hash and array structures, you can work with pretty much any service that exposes a JSON interface as if it were responding to you with Ruby structures. As an example, we can look at a fairly simple interface that does a web search and processes the JSON dataset it returns:

require "json"
require "open-uri"
require "cgi"

module GSearch
  extend self

  API_BASE_URI =
    "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q="

  def show_results(query)
    results = response_data(query)
    results["responseData"]["results"].each do |match|
      puts CGI.unescapeHTML(match["titleNoFormatting"]) + ":\n  " + match["url"]
    end
  end

  def response_data(query)
    data = open(API_BASE_URI + URI.escape(query),
                  "Referer" => "http://rubybestpractices.com").read
    JSON.parse(data)
  end

end

Here we’re using json and the open-uri library, which was discussed earlier in this appendix as a way to wrap a simple Google web search. When run, this code will print out a few page titles and their URLs for any query you enter. Here’s a sample of GSearch in action:

GSearch.show_results("Ruby Best Practices") # OUTPUTS

Ruby Best Practices: Rough Cuts Version | O'Reilly Media:
  http://oreilly.com/catalog/9780596156749/
Ruby Best Practices: The Book and Interview with Gregory Brown:
  http://www.rubyinside.com/ruby-best-practices-gregory-brown-interview-1332.html
On Ruby: A 'Ruby Best Practices' Blogging Contest:
  http://on-ruby.blogspot.com/2008/12/ruby-best-practices-blogging-contest.html
Gluttonous : Rails Best Practices, Tips and Tricks:
  http://glu.ttono.us/articles/2006/02/06/rails-best-practices-tips-and-tricks

Maybe by the time this book comes out, we’ll have nabbed all four of the top spots, but that’s beside the point. You’ll want to notice that the GSearch module interacts with the json library in a sum total of one line, in order to convert the dataset into Ruby. After that, the rest is business as usual.

The interesting thing about this particular example is that I didn’t read any of the documentation for the search API. Instead, I tried a sample query, converted the JSON to Ruby, and then used Hash#keys to tell me what attributes were available. From there I continued to use ordinary Ruby reflection and inspection techniques straight from irb to figure out which fields were needed to complete this example. By thinking of JSON datasets as nothing more than the common primitive Ruby objects, you can accomplish a lot using the skills you’re already familiar with.

After seeing how easy it is to consume JSON, you might be wondering how you’d go about producing it using your own custom objects. As it turns out, there really isn’t that much to it. Say, for example, you had a Point class that was responsible for doing some calculations, but that at its essence it was basically just an ordered pair representing an [x,y] coordinate. Producing the JSON to match this is easy:

require "json"

class Point

  def initialize(x,y)
    @x, @y = x, y
  end

  def distance_to(point)
    Math.hypot(point.x - x, point.y - y)
  end

  attr_reader :x, :y

  def to_json(*args)
    [x,y].to_json(*args)
  end

end

point_a = Point.new(1,2)
puts point_a.to_json #=> "[1,2]"

point_data = JSON.parse('[4,6]')
point_b = Point.new(*point_data)

puts point_b.distance_to(point_a) #=> 5.0

Here, we have simply represented our core data in primitives and then wrapped our object model around it. In many cases, this is the most simple, implementation-independent way to represent one of our objects.

However, in some cases you may wish to let the object internally interpret the structure of a JSON document and do the wrapping for you. The Ruby json library provides a simple hook that depends on a bit of metadata to convert our parsed JSON into a customized higher-level Ruby object. If we rework our example, we can see how it works:

require "json"

class Point
  def initialize(x,y)
    @x, @y = x, y
  end

  def distance_to(point)
    Math.hypot(point.x - x, point.y - y)
  end

  attr_reader :x, :y

  def to_json(*args)
    { 'json_class' => self.class.name,
      'data' => [@x, @y] }.to_json(*args)
  end

  def self.json_create(obj)
    new(*obj['data'])
  end
end

point_a = Point.new(1,2)
puts point_a.to_json #=> {"json_class":"Point","data":[1,2]}


point_b = JSON.parse('{"json_class":"Point","data":[4,6]}')
puts point_b.distance_to(point_a) #=> 5.0

Although a little more work needs to be done here, we can see that the underlying mechanism for a direct Ruby→JSON→Ruby round trip is simple. The JSON library depends on the attribute json_class, which points to a string that represents a Ruby class name. If this class has a method called json_create, the parsed JSON data is passed to this method and its return value is returned by JSON.parse. Although this approach involves rolling up our sleeves a bit, it is nice to see that there is not much magic to it.

Although this adds some extra noise to our JSON output, it does not add any new constructs, so there is not an issue with other programming languages being able to parse it and use the underlying data. It simply comes with the added benefit of Ruby applications being able to simply map raw primitive data to the classes that wrap them.

Depending on your needs, you may prefer one technique over the other. The benefit of providing a simple to_json hook that produces raw primitive values is that it keeps your serialized data completely implementation-agnostic. This will come in handy if you need to support a wide range of clients. The benefit of using the "json_class" attribute is that you do not have to think about manually building up high-level objects from your object data. This is most beneficial when you are serving up data to primarily Ruby clients, or when your data is complex enough that manually constructing objects would be painful.

No matter what your individual needs are, it’s safe to say that JSON is something to keep an eye on moving forward. Ruby’s implementation is fast and easy to work with. If you need to work with or write your own web services, this is definitely a tool you will want to familiarize yourself with.

Embedded Ruby for Code Generation (erb)

Code generation can be useful for dynamically generating static files based on a template. When we need this sort of functionality, we can turn to the erb standard library. ERB stands for Embedded Ruby, which is ultimately exactly what the library facilitates.

In the most basic case, a simple ERB template[23] might look like this:

require 'erb'

x = 42
template = ERB.new("The value of x is: <%= x %>")
puts template.result(binding)

The resulting text looks like this:

The value of x is: 42

If you’ve not worked with ERB before, you may be wondering how this differs from ordinary string interpolation, such as this:

x = 42
puts "The value of x is: #{x}"

The key difference to recognize here is the way the two strings are evaluated. When we use string interpolation, our values are substituted immediately. When we evaluate an ERB template, we do not actually evaluate the expression inside the <%= ... %> until we call ERB#result. That means that although this code does not work at all:

string = "The value of x is: #{x}"
x = 42
puts string

the following code will work without any problems:

require 'erb'

template = ERB.new("The value of x is: <%= x %>")
x = 42

puts template.result(binding)

This is the main reason why ERB can be useful to us. We can write templates ahead of time, referencing variables and methods that may not exist yet, and then bind them just before rendering time using binding.

We can also include some logic in our files, to determine what should be printed:

require "erb"

class A

  def initialize(x)
    @x = x
  end

  attr_reader :x

  public :binding

  def eval_template(string)
    ERB.new(string,0,'<>').result(binding)
  end

end

template = <<-EOS

<% if x == 42 %>
You have stumbled across the Answer to the Life, the Universe, and Everything
<% else %>
The value of x is <%= x %>
<% end %>
EOS

foo = A.new(10)
bar = A.new(21)
baz = A.new(42)

[foo, bar, baz].each { |e| puts e.eval_template(template) }

Here, we run the same template against three different objects, evaluating it within the context of each of their bindings. The more complex ERB.new call here sets the safe_level the template is executed in to 0 (the default), but this is just because we want to provide the third argument, trim_mode. When trim_mode is set to "<>", the newlines are omitted for lines starting with <% and ending with %>. As this is useful for embedding logic, we need to turn it on to keep the generated string from having ugly stray newlines. The final output of the script looks like this:

The value of x is 10
The value of x is 21
You have stumbled across the Answer to the Life, the Universe, and Everything

As you can see, ERB does not emit text when it is within a conditional block that is not satisfied. This means a lot in the way of building up dynamic output, as you can use all your normal control structures to determine what text should and should not be rendered.

The documentation for the erb library is quite good, so I won’t attempt to dig much deeper here. Of course, no mention of ERB would be complete without an example of HTML templating. Although the API documentation goes into much more complicated examples, I can’t resist showing the template that I use for rendering entries in my blog engine.[24] This doesn’t include the site layout, but just the code that gets run for each entry:

<h2><%= title %></h2>

<%= entry %>

<div align="right">
<p><small>Written by <%= Blaag::AUTHOR %> on
<%= published_date.strftime("%Y.%m%.%d") %> at
<%= published_date.strftime("%H:%M" )%>  | <%= related %>

</small>
  </p>
</div>

This code is evaluated in the context of a Blaag::Entry object, which, as you can clearly see, does most of the heavy lifting through helper methods. Although this example might be boring and a bit trivial, it shows something worth keeping in mind. Just because you can do all sorts of logic in your ERB templates doesn’t mean that you should. The fact that these templates can be evaluated in a custom binding means that you are able to keep your code where it should be while avoiding messy string interpolation. If your ERB templates start to look more like Ruby scripts than templates, you’ll want to clean things up before you start pulling your hair out.

Using ERB for templating can really come in handy. Whether you need to generate a form letter or just plug some values into a complicated data format, the ability to late-bind data to a template and generate dynamic content on the fly from static templates is powerful indeed. Just be sure to keep in mind that ERB is meant to supplement your ordinary code rather than replace it, and you’ll be able to take advantage of this useful library without creating a big mess.

Conclusions

Hopefully these examples have shown the diversity you can come to expect from Ruby’s standard library. What you have probably noticed by now is that there really is a lot there—so much so that it might be a little overwhelming at first. Another observation you may have made is that there seems to be little consistency in interface between the libraries. This is because many were written by different people at different stages in Ruby’s evolution.

However, assuming that you can tolerate the occasional wart, a solid working knowledge of what is available in Ruby’s standard libraries is a key part of becoming a masterful Rubyist. It goes beyond simply knowing about and using these tools, though. Many of the libraries discussed here are written in pure Ruby, which means that you can actually learn a lot by grabbing a copy of Ruby’s source and reading through their implementations. I know I’ve learned a lot in this manner, so I wholeheartedly recommend it as a way to test and polish your Ruby chops.



[19] A DateTime is also similar to Ruby’s Time object, but it is not constrained to representing only those dates that can be represented in Unix time. See http://en.wikipedia.org/wiki/Unix_time.

[20] To run this example, you’ll need my Prawn-based helpers. These are included in this book’s git repository.

[21] The following paragraph summarizes a ruby-talk post from Ara T. Howard, found at http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/177666.

[22] The yaml/store library will allow you to use YAML in place of Marshal, but with many of the same limitations.

[23] Credit: this is the first example in the ERB API documentation.

If you enjoyed this excerpt, buy a copy of Ruby Best Practices.

Copyright © 2009 O'Reilly Media, Inc.