Why the Lucky Stiff’s fast, enjoyable Hpricot library makes hard Rails View tests effective and fun. Hpricot is a deep and useful HTML parser with a wide, flexible interface. It supports many clever systems to read and edit HTML. When we put it to work in Rails functional tests, it offers lots of different ways to solve hard problems.

This dissertation depends on Ruby on Rails; the general techniques apply to any web development. The assertions presented here are available in the Rails plugin assert_xpath.

Hpricot’s arch enemy is REXML, an XML parser bundled with Ruby. Here’s their score chart:



Hpricot REXML
compliance forgives anything
vaguely resembling HTML
too strict for
Transitional XHTML
utility several creative
Domain Specific Languages,
for lean, clear expressions
XPath, and a terse
Object Model
queries good CSS selectors
and poor XPath
no CSS selectors
and perfect XPath
speed optimized with C pessimized
with Regexps


The test plugin assert_xpath supports both systems, and enhances their DSLs.

A Rails functional test works by mocking the web server, and generating a sample web page as a big string, in the variable @response.body. Then a test case parses this string, looking for its important details. This technique avoids the overhead of invoking a real web server and browser, and commanding each to do something outside its performance envelop. The two query languages for HTML are CSS selectors and XPath.

Here’s a test case using raw Hpricot, before we cook it up in reusable assertions:

  def test_raw_Hpricot
    get :index, :id => 'FrontPage'          #  serve a WikiWiki
    doc = Hpricot(@response.body)           #  read the mock server response
    script = doc.search('script[3]').first  #  locate our target <script>
    assert_equal 'text/javascript', script['type'], 'script should be JS'
    assert_match /&/, script.to_s, 'oh no! our script has a & character!'
  end

Test cases can choose between Hpricot and REXML, to leverage each one’s advantages. assert_xml uses either, depending on a recent call to invoke_hpricot or invoke_rexml. (Use this technique with the Abstract Test Pattern, to run assertions twice.) Call assert_hpricot or assert_rexml directly, to override this default.

Assertiveness Counseling

Now we bundle those Hpricot calls up into two assertions, assert_hpricot and assert_xpath:

  def test_with_assert_hpricot
    get :index, :id => 'FrontPage'
    assert_hpricot  #  @response.body is the default
    script = assert_xpath('script[3]')
    assert_equal 'text/javascript', script['type'], 'script should be JS'
    assert_match /&/, script.to_s, 'our script has a & character!'
  end
 

Because Hpricot is forgiving, assert_hpricot itself does not actually assert very much! (Use assert_rexml or assert_tidy to validate your code.) The important part is the next line, assert_xpath, because it wraps doc.search, so we can put a wide subset of XPath into it. In this case, we only put in a [3], to select the third <script>.

You Are all Forgiven

Like a web browser, Hpricot forgives your HTML for its sins. Some test cases should not. But REXML is so unforgiving that Transitional XHTML might break it. The fun starts when your XML contains an & without its escapes:

      #  both Hpricot and REXML like well-formed & escapes:
    
      assert_xml '<a>&amp;</a>', 'a[ "&" = . ]'
      #           ^ input XML     ^ XPath to satisfy

      #  only Hpricot likes ill-formed escapes;

      assert_hpricot '<a>&</a>', 'a[ "&" = . ]'

      assert_raise_message REXML::ParseException, /Illegal character '&'/ do
        assert_rexml '<a>&</a>', 'a[ "&" = . ]'
      end

      #  and both like incomplete escapes!

      assert_xml '<a>&yo</a>', 'a[ "&yo" = . ]'

Why is that important? Because web browsers don’t process the escapes found in embedded JavaScript. That forces our tools to incorrectly escape these escapes when they generate HTML. So a Rails call to javascript_tag("document.write('&amp;');"), for example, will emit this:

  <script type="text/javascript">
  //<![CDATA[
  document.write('&amp;');
  //]]>
  </script>
  

Bless ActionView’s pointy head for escaping the entire block correctly, but according to the “law” (or “recommendations”), that output should contain &amp;amp;. Browsers should interpret that and pass &amp; as a source code literal to JavaScript, and this should push &amp; into the browser’s surface, which should then display & to your user. If an HTML tool like javascript_tag corrected that &amp;, modern browsers would not interpret it before the JavaScript layer, and your users would see &amp;. That’s not really what you wanted, and browsers can’t upgrade until everyone in the world who wrote their websites with Notepad upgrades their source. Don’t hold your breath. And so javascript_tag doesn’t escape the &amp; to &amp;amp;.

The culture of XML enforces well-formed contents, typically machine-generated. So even if REXML does not choke on any appearance of & followed by alphabetic characters, it still chokes on all the other appearances of &, such as && for and operations. And you can’t escape them because your browser won’t de-escape them. If these problems prevent you from using assert_rexml, prepare your XHTML first with a call like: @response.body.gsub!(%r/&(?=[^a-z])/i, '&amp;')

Hpricot doesn’t have all these problems.

Functional Tests for Views

A Rails test that operates on a controller is a “functional test”. These should guide the operations of complete features. Ideally, all our low-level data manipulations should appear inside models. Controllers control data transactions, and send results to Views. So the place to start view testing is the functional tests, where each page we render comes back as a big string.

  def test_buy_item_form
    login_as :tygr
    get :index
    assert_hpricot
    action = url_for(:action => :buy_items)
    assert_xpath "//form[ '#{ action }' = @action ]"
  end

The login_as method comes from one of Rails’s nifty authentication plugins. Then get :index simulates fetching the index page of our current controller. The assert_hpricot absorbs its output, and the assert_xpath reaches out to a suspect FORM.

Note that we always concoct URIs using url_for(), and we never hard-code FORM actions, such as “/training/buy_items“. We don’t want our tests to break just because we changed the file routes.rb.

The test is not complete yet because it doesn’t do anything with the FORM. First, we will upgrade its Hpricot stylings.

CSS Selectors

Note the first search used XPath to query for a given FORM, while the second one used CSS selector notation to identify the same FORM. Hpricot supports a subset of XPath, and CSS selectors, thru the same interface, so we can always use the system that’s most convenient. For example, if we must target an element with multiple classes, <div class="class_A class_D" />, our first attempt at a matching XPath is odious and fragile:

.//div[ contains(@class, “class_D”) ]

That’s fragile because a different class, “class_Dismissed“, would provide a false match. A better XPath would require more tedious string manipulations in its [predicate] filter. The CSS notation is more clear and accurate: “div.class_D“.

So this test case finds our FORM using its unique id, not its action:

    form = assert_xpath('form#buying_items')
    action = url_for(:action => :buy_items)
    assert_equal action, form[:action]

This opens the question how to test the link from that URI to its target action in the controller. We could change that action’s name, and this test wouldn’t break. Because unit tests for web sites cannot (yet) work with real servers and browsers, we must at least test each step, with overlapping test cases. One case will test we have a FORM, the next tests that it calls the right controller action, the next tests that the controller action does the right thing, and so on.

Submitting Forms

The Rails plugin form_test_helper works with assert_select (another useful assertion system based on an HTML parser and CSS selectors) to read a FORM’s input variables, and present each one as a helpful little collection. We can assert that our FORM contains the right action, then assert that submitting our FORM, with its current fields, will call the action correctly.

  def test_buy_item_submit_form
    login_as :tygr
    get :index
    assert_hpricot
    form = assert_xpath('form#buying_items')
    action = url_for(:action => :buy_items)
    assert_equal action, form[:action]

    submit_form form[:action] do |post|
      assert_equal users(:tygr).id.to_s, post['user[id]'].value
      post[:prop_1].check
      post[:prop_4].check
    end
    #  assertions here should check the controller
    #  updated the model and database correctly
  end

submit_form passes its post information into our block for treatment. We can assert that some automatic fields are populated correctly (including hidden ones), and we can simulate user input by changing some fields.

(Tip: Temporarily run p post.field_names, to remind yourself what your FORM contains.)

Conclusion

Hpricot’s XPath system cannot handle long elaborate queries. Use REXML if you need those. And Hpricot’s forgiveness envelop is a benefit when retrofitting tests to ill-formed HTML, but it’s a liability when building a site from scratch. Test cases should always incidentally coerce your code to improve its quality. If a super-strict test case, based on REXML, suddenly fails, you should revert your most recent edit and try again. This time you might not make the same mistake. Hpricot, in its default configuration, would not have warned you.

A test case can mix-and-match REXML and Hpricot freely; by passing the results of one into the base method of the other:

    def test_handoff
      assert_rexml '<anna><marie><candy><lights>' +
                   '  <since><imp>' +
                   '    <pulp lay="things" />' +
                   '  </imp></since>' +
                   '</lights></candy></marie></anna>'

      assert_xpath '/anna/marie/candy/lights' do |lights|
        lights = assert_hpricot(lights.to_s)  #  transfer a fragment of XML
        lights.since.imp.pulp{ @lay == 'things' }
      end  #  both assert_rexml and assert_hpricot
    end    #  support these query notations...

These assertions allow Rails view tests to move beyond reacting to code changes. You can upgrade a test to fail for the right reason, and then upgrade your code to pass the test. This improves confidence that your tests cover the right things, and you can change your code more freely without making mistakes.