O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters

Batch Updates with PL/pgSQL
Pages: 1, 2, 3, 4, 5

Benchmarks

How does this refactoring translate into performance? It's hard to say really, because the performance of the previous iterations of the functions depends on how many IDs were in coll_ids. Nevertheless, I reworked my original benchmarking program into a new version that compares the new functions to the old, plus the original Perl version, just as a control. The results:

batch:  8.27 wallclock seconds (0.02 usr + 0.56 sys = 0.58 CPU) @ 36.28/s
 func: 10.54 wallclock seconds (0.03 usr + 0.56 sys = 0.59 CPU) @ 28.47/s
 perl: 16.46 wallclock seconds (0.11 usr + 2.77 sys = 2.88 CPU) @ 18.23/s

Assuming that my benchmarking script tests a typical number of objects in the collection, the constant functions (with the label batch) are about 27.5 percent faster than the looping versions (func), while running nearly twice as fast as when all the work is done from within Perl (which also loops). Yay for batch updates!

Download the test to test the difference for yourself.

The Race Condition Again

In the previous article in this series, I wrote that I had eliminated the race condition between two processes updating the same collection. That was a little misleading. I almost eliminated it. One of the reviewers pointed out that there now exists a different--and much more subtle--race condition.

The issue is that, although locking the appropriate entry table row prevents inserts from executing until the collection update finishes, PostgreSQL checks foreign key constraints after executing an INSERT. This is the reverse of how the new functions behave: they lock the entry row before they do anything else.

Why is this a problem? Consider two connections to the database, one updating a collection via these functions and the other updating the same collection via manual queries.

  1. Manual connection inserts into entry_coll_tag.
  2. Function connection gets lock on entry row.
  3. Manual connection attempts to check the foreign key constraint, thus requesting a lock on entry. It blocks while it waits for the function connection to commit.
  4. Function connection makes a conflicting insert into the collection.
  5. Function connection waits on the manual connection as part of checking the foreign key constraint. Result: deadlock.

The key here is Step 2: If the function gets a lock after the manual connection has inserted the new row but before it checks the foreign key constraint, and makes a conflicting insert, there will be a deadlock; the connections wait for each other to finish, and therefore neither of them ever does. This is a very narrow race condition--much narrower than the original race condition--but it is still very real.

What's the solution to this problem? It's simple, really: never allow any connection to use anything other than your functions to update a collection. The functions always acquire the lock before doing anything else, so there's no way that one connection using the functions can get into a deadlock with another connection using them.

The best way to enforce this rule is with PostgreSQL permissions. For example, suppose that you let applications connect to your database using a PostgreSQL user named appuser. First, revoke that user's permission to make changes to the entry_coll_tag table:

try=# REVOKE INSERT, UPDATE, DELETE ON entry_coll_tag
try-# FROM   appuser;
REVOKE

Then, create the functions as a user that has these permissions, and use the SECURITY DEFINER at the end of each function declaration:

-- ...
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

The SECURITY DEFINER clause allows the function to execute with the permissions of the user that defined the function, rather than the user calling it. Because a privileged user created the function, unprivileged users will be able to use it to make changes to the collection table, even if they can't change the collection table directly.

Conclusion

Functions that perform a finite number of tasks are nearly always preferable to those that perform a variable number of tasks based on the number of items to process. PostgreSQL kindly provides the tools to eliminate looping constructs in linear functions. By using the generate_series() function and PostgreSQL's batch update syntax, it's relatively straightforward to eliminate array-based looping constructs in PL/pgSQL functions. Now that those tools are at your disposal, get out there and take advantage of them!

Acknowledgments

My thanks to Josh Berkus for showing me how to use generate_series() and batch updates to eliminate the loops in my original collection functions. I learned a great deal in the process, and I am pleased to pass on the benefits of Josh's knowledge here. I'm also grateful to AndrewSN for pointing out the race condition and explaining it to me.

David Wheeler is a developer at Portland, Oregon-based Values of n, where he writes the code that makes Stikkit's little yellow notes think.


Return to O'Reilly Databases


Comments on this article


Tagged Articles

Post to del.icio.us

This article has been tagged:

postgresql

Articles that share the tag postgresql:

Managing Many-to-Many Relationships with PL/pgSQL (17 tags)

Writing PostgreSQL Functions with PL/pgSQL (11 tags)

Datamining Apache Logs with PostgreSQL (8 tags)

Introducing Slony (8 tags)

Batch Updates with PL/pgSQL (8 tags)

View All

sql

Articles that share the tag sql:

How to Misuse SQL's FROM Clause (74 tags)

Configuring Database Access in Eclipse 3.0 with SQLExplorer (28 tags)

Managing Many-to-Many Relationships with PL/pgSQL (18 tags)

Hierarchical SQL (16 tags)

The Outer Limits of SQL JOINs (14 tags)

View All

plpgsql

Articles that share the tag plpgsql:

Managing Many-to-Many Relationships with PL/pgSQL (6 tags)

Batch Updates with PL/pgSQL (5 tags)

View All

programming

Articles that share the tag programming:

Rolling with Ruby on Rails (1374 tags)

Very Dynamic Web Interfaces (279 tags)

Ajax on Rails (231 tags)

Understanding MVC in PHP (202 tags)

A Simpler Ajax Path (186 tags)

View All

database

Articles that share the tag database:

MySQL FULLTEXT Searching (54 tags)

Live Backups of MySQL Using Replication (53 tags)

Advanced MySQL Replication Techniques (53 tags)

Dreaming of an Atom Store: A Database for the Web (49 tags)

How to Misuse SQL's FROM Clause (38 tags)

View All

Sponsored Resources

  • Inside Lightroom

Related to this Article

Hands-On Oracle Application Express Security Hands-On Oracle Application Express Security
April 2013
$24.99 USD

PHP, MySQL, JavaScript & HTML5 All-in-One For Dummies PHP, MySQL, JavaScript & HTML5 All-in-One For Dummies
by Steve Suehring
March 2013
$44.99 USD

Advertisement
Sign up today to receive special discounts,
product alerts, and news from O'Reilly.
Privacy Policy >
View Sample Newsletter >
  • Youtube
  • http://www.youtube.com/OreillyMedia
  • Twitter
  • Subscribe
  • View All RSS Feeds >
O'Reilly Media

800-889-8969 or 707-827-7019
Monday-Friday 7:30am-5pm PT
©2011, O'Reilly Media, Inc.
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  • About O'Reilly
  • Academic Solutions
  • Contacts
  • Customer Service
  • Careers
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly
  • Community
  • Authors
  • Forums
  • Membership
  • Newsletters
  • RSS Feeds
  • User Groups
  • Partner Sites
  • makezine.com
  • makerfaire.com
  • craftzine.com
  • igniteshow.com
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com