Object-Relational Models, like Rails’s ActiveRecord, help generate very complex queries. Sometimes we need that complexity - without slow execution times! We might not notice when our queries have grown beyond our databases’s tuning and indices.
This post shows how to trap, interpret, and
EXPLAIN SQL SELECT statements under
MySQL. Our platform is Ruby
on Rails, yet the lessons apply to any ORM.
Unit tests (and “specs”) should run very fast over small
sample databases. They can’t directly detect if query execution
times exceed our clients’ goals. Unit tests should always seek
indirect measurements, especially when they generate intermediate
code. Just as
assert_xpath detects flaws in HTML before they ever
see a browser, our new assertion - inspect_sql -
will analyze ActiveRecord’s generated SELECT
statements, and will trap inefficiencies before they slow down a
full database.
Our tool list includes all the usual suspects:
inspect_sql, from the assert_efficient_sql gem- Ruby on Rails 2.1 (with ActiveRecord)
- MySQL
- RSpec
- assert{ 2.0 }
Some of the examples work with rboard, a blog by Radar. The installation instructions work with any ActiveRecord application.
The inspect_sql system uses
assert_efficient_sql - review its blog entry
here.
Get the latest version with gem install
assert_efficient_sql.
Installation
Start by adding this to one of your test suites or specifications:
require 'assert_efficient_sql'
Now we simulate a client requirement - our system must find all posts written by a user of a given level.
Note that RSpec encourages us to write our specification using our clients’ own verbiage. Behavior Driven Development is about helping clients participate with developers.
To simplify this post, we will write our find statements directly into our specifications. These statements stand as placeholders for call to real application methods. They would hide our statements, making their performance issues less obvious.
describe Post, 'finding all posts by a given user level' do fixtures :posts, :users, :user_levels it 'should be efficient' do sql = inspect_sql :verbose => true do mod = user_levels(:moderator) users = User.find_all_by_user_level_id(mod.id) Post.find_all_by_user_id(users.map(&:id)) end assert{ sql.length < 3 } end end
Those query statements may look excessive, but remember our specification only simulates well-factored code, where one method might select the user population and another might select its representative posts.
EXPLAIN yourself
To see just how effusive they are, we wrap their block with
inspect_sql
:verbose => true. The temporary
:verbose option
incites this spew when the spec runs:
query for SELECT * FROM `user_levels` WHERE (`user_levels`.`id` = 429228497) UserLevel Load select_type | key_len | id | table | type | possible_keys | rows | Extra | ref | key ------------------------------------------------------------------------------------------------- SIMPLE | 4 | 1 | user_levels | const | PRIMARY | 1 | | const | PRIMARY query for SELECT * FROM `users` WHERE (`users`.`user_level_id` = 429228497) User Load select_type | key_len | id | table | type | possible_keys | rows | Extra | ref | key ------------------------------------------------------------------------------------------ SIMPLE | | 1 | users | ALL | | 4 | Using where | | query for SELECT * FROM `posts` WHERE (`posts`.`user_id` IN (429228497)) Post Load select_type | key_len | id | table | type | possible_keys | rows | Extra | ref | key ------------------------------------------------------------------------------------------ SIMPLE | | 1 | posts | ALL | | 9 | Using where | |
The inspection revealed three blocks (each starting with
“query for“) indicating our code ran three SQL
SELECT statements. inspect_sql then
passed each statement through the MySQL EXPLAIN
command, revealing each query was itself not very efficient.
Although one used a PRIMARY key index, the other two
read ALL their rows to find their targets.
If we added an index, to fix this inefficiency, we could add a
line like assert{ sql.keys.include? :my_index }. Then the assertion
would pass until a new feature, or a sloppy refactor, changed the
query in a way that forced MySQL to stop using that index.
Note the system :symbolizes each string result, simply to
make them easier to type. All of them refer back to low-level
identifiers which should generally obey Ruby’s rules for an
unquoted literal :symbol. To
nest delimiters in these symbols, such as dots, quote them like
this:
:'rboard_test.users.user_level_id'
Fruit Roll-Ups
inspect_sql rolls the fruit of your queries up
into arrays of descriptions. Each query method collates columns
from the EXPLAIN results. Extra becomes
sql.extras, possible_keys becomes
.possible_keys, etc. Use these roll-up methods for
bulk assertions; for example, to assert that none of your methods
queried certain slow .tables, in preference to
faster ones.
To pass our last assertion, and pack everyone into one
SQL SELECT statement, we need fewer Ruby statements,
with more ActiveRecord magic:
it 'should be more efficient' do sql = inspect_sql :verbose => true do Post.find(:all, :include => { :user => :user_level }, :conditions => { :'user_levels.name' => 'Moderator' }) end assert{ sql.length == 1 } deny{ sql.types.include? :ALL } end
That spews this (reformatted) monstrosity:
query for SELECT `posts`.`id` AS t0_r0, `posts`.`text` AS t0_r1, `posts`.`created_at` AS t0_r2 ...
FROM `posts` LEFT OUTER JOIN `users` ON `users`.id = `posts`.user_id
LEFT OUTER JOIN `user_levels` ON `user_levels`.id = `users`.user_level_id
WHERE (`user_levels`.`name` = 'Moderator')
Post Load Including Associations
select_type | key_len | id | table | type | possible_keys | rows | Extra | ref | key
----------------------------------------------------------------------------------------------------------------------------------
SIMPLE | | 1 | posts | ALL | | 9 | | |
SIMPLE | | 1 | users | ALL | PRIMARY | 3 | Using where | |
SIMPLE | 4 | 1 | user_levels | eq_ref | PRIMARY | 1 | Using where | rboard_test.users.user_level_id | PRIMARY
The SELECT statement got longer, mostly due to
eager-loading all the tables we used. (If your code does not work
with that upgrade, replace the :include directive with :joins, and write the LEFT OUTER
JOIN parts yourself. This post won’t go there!)
More important, the EXPLAIN says MySQL does more
of the work for us. It now runs three queries internally to get
our one response. This would be more efficient - if we added
indices to these tables. The EXPLAIN block shows us
right where to start: At posts.user_id. The right
indices will allow our last deny{} line to pass;
when MySQL stops reading ALL our rows.
Matchmaker
A long method (such as a controller action) could query dozens
of supporting tables before running the one that interests you.
The first fix for this situation is a test case closer to the
Model layer. Then filter any remaining utility queries
out of your inspect_sql by adding a :match argument, with a Regular
Expression declaring which tables interest you. The
expression must match any part of the generated
SELECT statement, so chose a relatively stable
part:
def test_order_transactions_by_id sqls = inspect_sql :match => /SELECT conversations.\*/ do get_sales_after_transaction_id end statement = sqls.first[:statement] assert{ statement =~ /ORDER BY id asc/ } end
That test answers a particularily itchy TDD question -
how to assert that our query explicitly declared :order => 'id asc',
when that’s (apparently!) the MySQL default? To test other
orders, such as by name, you would simply ensure
your fixture tables declare some names out of order, then
trivially test that your query returned them in order. Forcing
MySQL to get a table’s PRIMARY key out of order -
if possible at all - probably requires too much rocket surgery.
Our tests have bigger fish to fry, but we should not give any
future programmers (including ourselves) any nasty surprises when
a code change pushes that query out of order. So, instead of testing
the query results, we test the code that
ActiveRecord generates. This indirectly measures our ORDER
BY clause.
inspect_sql helps one of TDD’s major strengths.
It helps tests make your code’s internals more visible.
