Object-Relational Models, like Rails’s ActiveRecord, help generate very complex queries. Sometimes we need that complexity - without slow execution times! We might not notice when our queries have grown beyond our databases’s tuning and indices.

This post shows how to trap, interpret, and EXPLAIN SQL SELECT statements under MySQL. Our platform is Ruby on Rails, yet the lessons apply to any ORM.

Unit tests (and “specs”) should run very fast over small sample databases. They can’t directly detect if query execution times exceed our clients’ goals. Unit tests should always seek indirect measurements, especially when they generate intermediate code. Just as assert_xpath detects flaws in HTML before they ever see a browser, our new assertion - inspect_sql - will analyze ActiveRecord’s generated SELECT statements, and will trap inefficiencies before they slow down a full database.

Our tool list includes all the usual suspects:

Some of the examples work with rboard, a blog by Radar. The installation instructions work with any ActiveRecord application.

The inspect_sql system uses assert_efficient_sql - review its blog entry here.

Get the latest version with gem install assert_efficient_sql.

Installation

Start by adding this to one of your test suites or specifications:

require 'assert_efficient_sql'

Now we simulate a client requirement - our system must find all posts written by a user of a given level.

Note that RSpec encourages us to write our specification using our clients’ own verbiage. Behavior Driven Development is about helping clients participate with developers.

To simplify this post, we will write our find statements directly into our specifications. These statements stand as placeholders for call to real application methods. They would hide our statements, making their performance issues less obvious.

describe Post, 'finding all posts by a given user level' do
  fixtures :posts, :users, :user_levels

  it 'should be efficient' do
    sql = inspect_sql :verbose => true do
            mod = user_levels(:moderator)
            users = User.find_all_by_user_level_id(mod.id)
            Post.find_all_by_user_id(users.map(&:id))
          end

    assert{ sql.length < 3 }
  end
end

Those query statements may look excessive, but remember our specification only simulates well-factored code, where one method might select the user population and another might select its representative posts.

EXPLAIN yourself

To see just how effusive they are, we wrap their block with inspect_sql :verbose => true. The temporary :verbose option incites this spew when the spec runs:

query for SELECT * FROM `user_levels` WHERE (`user_levels`.`id` = 429228497)

UserLevel Load
   select_type | key_len | id | table       | type  | possible_keys | rows | Extra | ref   | key
   -------------------------------------------------------------------------------------------------
   SIMPLE      | 4       | 1  | user_levels | const | PRIMARY       | 1    |       | const | PRIMARY

query for SELECT * FROM `users` WHERE (`users`.`user_level_id` = 429228497)

User Load
   select_type | key_len | id | table | type | possible_keys | rows | Extra       | ref | key
   ------------------------------------------------------------------------------------------
   SIMPLE      |         | 1  | users | ALL  |               | 4    | Using where |     |

query for SELECT * FROM `posts` WHERE (`posts`.`user_id` IN (429228497))

Post Load
   select_type | key_len | id | table | type | possible_keys | rows | Extra       | ref | key
   ------------------------------------------------------------------------------------------
   SIMPLE      |         | 1  | posts | ALL  |               | 9    | Using where |     |

The inspection revealed three blocks (each starting with “query for“) indicating our code ran three SQL SELECT statements. inspect_sql then passed each statement through the MySQL EXPLAIN command, revealing each query was itself not very efficient. Although one used a PRIMARY key index, the other two read ALL their rows to find their targets.

If we added an index, to fix this inefficiency, we could add a line like assert{ sql.keys.include? :my_index }. Then the assertion would pass until a new feature, or a sloppy refactor, changed the query in a way that forced MySQL to stop using that index.

Note the system :symbolizes each string result, simply to make them easier to type. All of them refer back to low-level identifiers which should generally obey Ruby’s rules for an unquoted literal :symbol. To nest delimiters in these symbols, such as dots, quote them like this:

:'rboard_test.users.user_level_id'

Fruit Roll-Ups

inspect_sql rolls the fruit of your queries up into arrays of descriptions. Each query method collates columns from the EXPLAIN results. Extra becomes sql.extras, possible_keys becomes .possible_keys, etc. Use these roll-up methods for bulk assertions; for example, to assert that none of your methods queried certain slow .tables, in preference to faster ones.

To pass our last assertion, and pack everyone into one SQL SELECT statement, we need fewer Ruby statements, with more ActiveRecord magic:

  it 'should be more efficient' do
    sql = inspect_sql :verbose => true do
      Post.find(:all, :include => { :user => :user_level },
                   :conditions => { :'user_levels.name' => 'Moderator' })
    end

    assert{ sql.length == 1 }
    deny{ sql.types.include? :ALL }
  end

That spews this (reformatted) monstrosity:

query for SELECT `posts`.`id` AS t0_r0, `posts`.`text` AS t0_r1, `posts`.`created_at` AS t0_r2 ...
            FROM `posts`  LEFT OUTER JOIN `users` ON `users`.id = `posts`.user_id
                          LEFT OUTER JOIN `user_levels` ON `user_levels`.id = `users`.user_level_id
           WHERE (`user_levels`.`name` = 'Moderator')

Post Load Including Associations
   select_type | key_len | id | table       | type   | possible_keys | rows | Extra       | ref                             | key
   ----------------------------------------------------------------------------------------------------------------------------------
   SIMPLE      |         | 1  | posts       | ALL    |               | 9    |             |                                 |
   SIMPLE      |         | 1  | users       | ALL    | PRIMARY       | 3    | Using where |                                 |
   SIMPLE      | 4       | 1  | user_levels | eq_ref | PRIMARY       | 1    | Using where | rboard_test.users.user_level_id | PRIMARY

The SELECT statement got longer, mostly due to eager-loading all the tables we used. (If your code does not work with that upgrade, replace the :include directive with :joins, and write the LEFT OUTER JOIN parts yourself. This post won’t go there!)

More important, the EXPLAIN says MySQL does more of the work for us. It now runs three queries internally to get our one response. This would be more efficient - if we added indices to these tables. The EXPLAIN block shows us right where to start: At posts.user_id. The right indices will allow our last deny{} line to pass; when MySQL stops reading ALL our rows.

Matchmaker

A long method (such as a controller action) could query dozens of supporting tables before running the one that interests you. The first fix for this situation is a test case closer to the Model layer. Then filter any remaining utility queries out of your inspect_sql by adding a :match argument, with a Regular Expression declaring which tables interest you. The expression must match any part of the generated SELECT statement, so chose a relatively stable part:

  def test_order_transactions_by_id
    sqls = inspect_sql :match => /SELECT conversations.\*/ do
      get_sales_after_transaction_id
    end
    statement = sqls.first[:statement]
    assert{ statement =~ /ORDER BY id asc/ }
  end

That test answers a particularily itchy TDD question - how to assert that our query explicitly declared :order => 'id asc', when that’s (apparently!) the MySQL default? To test other orders, such as by name, you would simply ensure your fixture tables declare some names out of order, then trivially test that your query returned them in order. Forcing MySQL to get a table’s PRIMARY key out of order - if possible at all - probably requires too much rocket surgery.

Our tests have bigger fish to fry, but we should not give any future programmers (including ourselves) any nasty surprises when a code change pushes that query out of order. So, instead of testing the query results, we test the code that ActiveRecord generates. This indirectly measures our ORDER BY clause.

inspect_sql helps one of TDD’s major strengths. It helps tests make your code’s internals more visible.