At RailsConf 2007 DHH mentioned that Rails 2.0 would support query caching on the client side in order to speed up AR. I immediately thought to myself, “Huh? Why do it on the client side when the database server will handle that?”.
The answer is that ActiveRecord (AR) doesn’t support bind parameters. In fact, AR is downright deceptive in this regard, because it sure *looks* like it’s using bind parameters. Consider this simple example:
orders = Order.find(
:all,
:conditions => 'name = ?', 'Daniel Berger'
)
In DBI, that query would look something like this:
...
sth = dbh.prepare('select * from order where name = ?')
sth.execute('Daniel Berger')
orders = sth.fetch
...
Anyone coming from a DBI background would look and the AR version and think that bind parameters were being used. But, in fact, AR is doing variable interpolation behind the scenes. This is not optimal for databases that support bind parameters. But first, a bit about Oracle and bind parameters. This mostly applies to PostgreSQL as well 1.
When you send a query to an Oracle DB server a few things happen. First, the query is parsed to ensure that it’s formed properly. Then, Oracle determines the execution plan (aka ‘explain plan’). The execution plan is, in short, the strategy that the DB server forms in order to fetch the rows for the query, e.g. “do a full table scan on table X and an index scan on table Y”. Finally, it fetches the data.2 Parsing the SQL and creating the execution plan is the most expensive portion of the operation.
Back to the part about why not supporting bind parameters is a bad idea. First, and foremost, is performance. With variable interpolation SQL is re-parsed and the execution plan is re-formed every time the query is run. By using bind parameters the execution plan is generated only *once* (and is stored in a query cache) since the strategy for fetching the rows doesn’t change, merely the particular column value that you happen to be looking for.
To prove my point, here’s some sample code that’s very similar to what I use in a production report. It grabs a list of telephone numbers from a plain text file and gets the necessary information based on that number. The first example uses variable interpolation, the second uses bind parameters. Benchmarks then follow. Column and table names changed to protect the innocent:
# Variable Interpolation
IO.foreach('numbers.txt'){ |tn|
sth = dbh.prepare("
select so.order_number, so.order_id, so.service_number
from some_order so, network nh, service se
where se.service_id = so.service_id
and nh.child_link = se.child_link(+)
and nh.telephone_number = '#{tn}'
")
sth.execute
info = sth.fetch
}
# Bind parameters
sth = dbh.prepare("
select so.order_number, so.order_id, so.service_number
from some_order so, network nh, service se
where se.service_id = so.service_id
and nh.child_link = se.child_link(+)
and nh.telephone_number = ?
")
IO.foreach('numbers.txt'){ |tn|
sth.execute(tn)
info = sth.fetch
}
When I ran this against our reports database against 10,000 telephone numbers, the first example took approximately 3:50 in repeated runs, while the second example took approximately 1:45 in repeated runs.3 This example is typical and, in fact, most of my reports consist of nested cursors and much more complex SQL. In those cases the performance difference is even more significant.
In addition bind parameters are also a better defense against SQL injection attacks. AR protects you via quoting. It also cuts down on CPU cycles being used by the DB server, which the DBA’s will appreciate. Lastly, bind parameters are *absolutely necessary* for getting at binary data. This point is crucial, because it’s a problem that caching queries on the client side won’t solve.
What can we do about it? I believe Izumi 4 has the best idea - refactor the AR design to support actual bind parameters for those vendors that support them, and fake the rest with quoting to make the interface seamless. When necessary, disable automatic binding for those situations where they aren’t ideal. 5 In the case of MySQL, that would probably be often, since bind parameters actually seem to slow down many queries. Furthermore, I’ve been told that the MySQL query cache is cleared every time an INSERT or UPDATE occurs - not very useful.
His suggestion has been submitted, but the adapters still need work. Please take a look.
One final Oracle-specific note. Oracle has the concept of “hints”, little bits of meta information you can embed in the sql directly in order to alter the way Oracle generates its explain plan and/or to change the behavior of the results, e.g. PARALLEL to take advantage of multiple cpu architectures and FIRST_ROWS to optimize for immediate results.6. I’d like to see the Oracle adapter support this, and it would be easier to integrate if we adopt Izumi’s architecture.
See you next Wednesday.7
1 I’ll talk about MySQL later. I am not familiar enough with DB2 or SQLServer to comment, but my hunch is that they support bind parameters as well.
2 For a more in depth explanation of what’s going on behind the scenes, please see Luca Mearelli’s excellent article at http://www.oracle.com/technology/pub/articles/mearelli-optimizing-oracle-rails.html
3 For those wondering, I actually used the NOCACHE hint in the SQL to make sure the DB server wasn’t using a pre-existing cache.
4 http://izumi.plan99.net/blog/ - scroll down a bit for the SVG. Don’t use IE.
5 Such as histograms and ‘like’ queries.
6 Tools like TOAD use this so they can display a small result set immediately before all the rows are actually fetched. This would be useful for pagination.
7 On a final note I would like to thank Tim Bunce of (Perl fame) for his DBI book that taught me some basic knowledge of bind parameters back when I was a fledgling Perl programmer.

Note that recent versions of Rails use an Oracle db setting for cursor sharing that effectively emulates bind variables. It's not as good as real bind variables, but performance tests indicate that it's nearly so. Agreed that true support is better, and a bunch of folks are working on that.
Also, I've made some local changes to support the use of Oracle hints, I can probably pull that into a patch for core.
Finally, note that the query caching DHH talked about is still FAR superior (in those cases in which it applies) to any sort of db caching. Not hitting the db will always trump having to hit the db, even if the db's got the same data cached.
The query cache coming in Rails 2 (as of rev 6202) isn't a very good idea for sites running more than a moderate number of Rails processes since it wastes memory. memcached should be used for instead, so you get the largest cache size possible.
The MySQL query cache is completely unrelated to prepared statements. There's some effort underway for 5.1/6.0 (I forget which) to make the cache usable by prepared statements, but the basic principle of MySQL's query cache is this: if the bytes of an incoming SQL SELECT match exactly a cached SELECT, the last output is substituted. That's all. And that's why writes nuke the cache: data in the changed table may not match the cached result sets anymore.
Hi. I guess I'm another person working on getting bind variables working. Following up on some conversations at RailsConf, I figured I'd have a shot at seeing what could be done without refactoring Active Record. FWIW, I've got an experimental patch for improving bindvar handling in Oracle. It's very fresh code, and incomplete (doesn't handle NULL or date values), but it does pass Active Record unit tests, so it might be fun for those inclined to play with such things.
(Unfortunately, doing this strictly at the adapter level requires a slightly dirty trick. If the adapter's quote method returns invalid SQL, it turns out that the only thing that will notice is the adapter's own execute method, which then has a chance to fix it up. So quote can leave a trail of bread crumbs in the generated "SQL" which execute can use to find the places where a quoted value got substituted in. It's not pretty, but at least in the AR unit tests, it doesn't blow up...)
I think DB2 dose support bind variables/ query caching. I get the impression it makes quite a difference to performance.
Bind variables were the number 1 performance improvement to be made in most applications that I've seen (sad as that is). One of my favorite consulting tricks was to get current performance statistics, get the top 10 queries not using bind variables, change a couple, and rerun statistics. I've seen orders of magnitude of change in the past (at one place changing their 2 worst queries decreased load from a 22 on average (8 proc box) to a 3 and increased query throughput by over 4 x. Actually this was the first shortcoming I saw with AR as most apps I've been pulled in to consult on didn't make use of these properly (and it was the quick and easy way to "show value").
DB2 (mainframe and UDB) and SQL Server definitely use bound parameters.
I think a database without them is unable to claim being a database.
Having said this, I was under the impression AR used bound parameters for PostgreSQL- I am surprised it does not.
I think that bind parameters are essential. If Rails is going to compete effectively in the web app arena, it needs to make use of every performance advantage it can. This is a big one.
Oracle and bind variables... in tests like this the test code with bind variables is always 2-3 times faster. But... make the test code multi-threaded, and run it against oracle on a 4 or more CPU server. (hint - entry level Sun Sparc servers now come with effectively 32 CPUs!) Due to latches in the statement cache in Oracle forcing massive concurrency contention, the proper use of bind variables in Oracle can easily make a 10x or more difference in system capacity. Ie, use bind variables and handle 10x more users on the same db with the same page response time.