O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


Using Perl in PostgreSQL

by Andrew Dunstan
11/10/2005

Most Perl users are familiar with using Perl to talk to databases. Perl's DBI is, along with ODBC and JDBC, one of the most common and widely ported database client interfaces. The DBI driver for PostgreSQL, DBD::Pg, is very well-maintained, and quite featureful. For example, it recently acquired proper support for prepared statements. Previously, the client library had emulated these, but with the latest DBD::Pg and PostgreSQL distributions, you can get real prepared queries, which can lead to big performance gains in some cases.

However, there is another way of using Perl with PostgreSQL--writing little Perl programs that actually execute inside of the server. This way of using Perl is less well known than using the DBI driver, and is, as far as I know, unique to PostgreSQL. It lets you do some very cool things that you just can't do in the client.

Server-Side Languages

In fact, PostgreSQL lets you create server-side routines in quite a few languages, including one called PL/PGSQL that is all its own, and is somewhat similar to Oracle's PL/SQL. The PostgreSQL core distribution supports and maintains three other procedural language interfaces to third-party interpreters: Perl, Python, and Tcl (the first procedural language that PostgreSQL supported). There are also other languages maintained outside of the core distribution for various reasons, including PL/Java (or an alternative flavor, PL/J), Pl/R, PL/Ruby, PL/PHP, and a vastly better PL/Python. If you, like me, are at home in Perl, you will probably want to write your server-side functions in Perl, too.

PostgreSQL languages come in two flavors: trusted and untrusted. Trusted languages are those that guarantee not to touch the file system, or other machine resources, while untrusted languages make no such promise. Postgres also protects your machine by refusing to run as root (or a similarly privileged user on Windows). But on a database server, your most valuable asset is probably the data itself, and so you need some additional protection where code might maliciously attack your data via access to the machine's resources. For that reason, only highly privileged database users ("superusers") can create functions in untrusted languages. Only they are allowed to install procedural languages at all, trusted or untrusted. Unless you have installed another language, the only ones available are SQL (which is trusted) and C (which is untrusted).

Enabling PL/Perl

PL/Perl actually comes in both of these flavors--the trusted version runs inside the standard Perl Safe container, with very few native Perl operations allowed. The easiest way to install either flavor of PL/Perl in a database is via the createlang program that should be part of your distribution. For example:

$ createlang plperl mydb

For the untrusted version, use instead:

$ createlang plperlu mydb

A Simple Example

The easy way to show how to use PL/Perl is to create a very simple function; one that would be a lot harder to do otherwise. Suppose that you want to test if a given piece of text is a palindrome (a word that reads the same backwards as forwards), disregarding white space and the case of the letters. Here's a piece of SQL to define the function:

create function palindrome(text) returns boolean 
language plperl 
immutable
as '

  my $arg = shift;
  my ($canonical = lc $arg) =~ s/\s+//g;
  return ($canonical eq reverse $canonical) ? "true" : "false";

';

Given this function, you can write SQL like:

select name, region, country, population
from towns
where palindrome(name);

If you can't build a test like this on the server side, you have to get all of the towns in the client and filter there, but that's horribly inefficient. Getting the server to test for you is far nicer.

The create function statement declares a function. It requires a name, an argument type list (which can be empty, but you must use the parentheses), a result type, and a language. In this case, I added a further argument, immutable, which tells Postgres that I guarantee that the function value depends only on its input, enabling it to do some optimization. Finally, there is the AS clause, followed by an SQL string literal that contains the body of the function.

The body is actually the body of an anonymous subroutine--the glue code wraps it up in a call to the Perl interpreter to create this subroutine and return the reference to the glue code, which stashes it away for later use. The glue code also stores the text in the database catalogs for later retrieval if necessary. Postgres compiles each function once per database session; it does not cache any Perl bytecode. From a programmer's point of view, remember that it wraps up your code in something like:

sub { <your text> }

PLPerl function arguments appear in @_, just like in regular Perl subroutines, and your code can handle them the same way. The string that contains the body is a normal SQL string and has to obey the same escaping rules as other SQL strings. Because this can lead to some considerable ugliness with strings that need reparsing, version 8.0 of PostgreSQL introduced an alternative quoting mechanism. Implemented by Tom Lane and me, it's colloquially known as "dollar quoting," and can make function bodies more readable in SQL code. I'll use that for my subsequent examples.

The argument arrives as a string, no matter what type it has in the database. For simple types, just return a string, which must be a valid literal of the return type. The SQL value NULL maps to the Perl value undef, both for arguments and for return values.

Practical PostgreSQL

Related Reading

Practical PostgreSQL
By John C. Worsley, Joshua D. Drake

Writing Triggers in PL/Perl

Up until PostgreSQL version 8.0, functions like the one above were just about all you could do with PL/Perl. They're useful, but not world-shattering. Version 8.0 gave PL/Perl a major shot in the arm, and more good stuff is coming in version 8.1. From version 8.0, you can do these cool new things:

Triggers can be extremely useful. Users of PostgreSQL and commercial high-end database systems have long used them to solve all sorts of problems that otherwise the client might have to handle (and often unreliably). One common use is for auditing changes. The audit.sql file accompanying this article contains an example of an account transaction table with an attached trigger that logs every change to the table. The output from the script looks like this:

  mod   | user_id |              ts               |                   new_txn                   |                   old_txn                   
--------+---------+-------------------------------+---------------------------------------------+---------------------------------------------
 INSERT |      10 | 2005-07-17 18:50:28.991595-04 | (1,1,1000.00,"initial 'deposit'")           | 
 UPDATE |      10 | 2005-07-17 18:50:29.051915-04 | (1,1,1000.00,"modified: initial 'deposit'") | (1,1,1000.00,"initial 'deposit'")
 DELETE |      10 | 2005-07-17 18:50:29.05932-04  |                                             | (1,1,1000.00,"modified: initial 'deposit'")
(3 rows)

Although there are no transactions in the transactions table, this contains a complete record, or audit trail, of everything that has been done to the table. In particular, it logs what happened, who did it, and when they did it. It doesn't matter what SQL they used; whatever they did to change the transactions table in any way is in the log. This is the sort of accountability that auditors and lawyers love, and it is very easy to achieve with triggers.

There is quite a lot to notice in this very simple example, especially if you usually use a database that doesn't provide all of these features. First, notice that everything goes into a schema called accts. A SQL schema is a namespace, and names of objects only have to be unique within a schema. Moreover, dropping a schema with cascading drops everything inside of the schema, so cleaning up is easy.

Next, notice that the declaration of the function current_sysid() is VOLATILE rather than IMMUTABLE. This is because its value does not depend only on its arguments, and furthermore, its value could potentially change within a single table scan via SECURITY DEFINER (a sort of setuid mechanism).

There is nothing very remarkable about the txn table, but the txn_audit table looks very odd--the new and old fields are of type accts.txn! This uses a new feature in PostgreSQL version 8.0 that allows tables to contain fields of opposite types. Because the name of a table is also the name of a corresponding composite type, you can use a table name as the type of a field, and that field will take exactly the same type of record as the table. This neat arrangement means that no matter how complex the base table, the audit table needs only these five fields. Without this facility, the schema would have to duplicate all of the fields in the base table twice!

No matter what the language, the trigger must be a function that takes no arguments and returns the special type trigger. Trigger functions in PL/Perl receive a special object called $_TD that contains lots of useful information about what caused it to be called. Of particular interest in this function are the event type that causes the trigger to fire and the values of the old and new data. Other interesting values in other circumstances include the table being modified, whether the trigger is called before or after the update, and any arguments passed to it by the trigger setup. The documentation contains a full list. Most of the work of the function consists of constructing an SQL statement to populate the audit table. It takes special care to escape the details fields, both to avoid upsetting the SQL parser and to avoid SQL injection attacks. Ideally, the trigger could use a prepared query instead of much of this, but that feature likely won't arrive until PostgreSQL 8.2.

After constructing the SQL query, the code uses another feature that is new in version 8.0--the ability to call back to the database to perform some action. spi_exec_query() does the work and returns a hash with some information, including a status field. If this is the expected SPI_INSERT_OK value, great! If not, the code returns the value SKIP to Postgres, telling it not to perform the modification to the txn table at all. If the transaction cannot be logged, the table modification should not happen. Another option is to modify the values in $_TD->{new} and return MODIFY. This would have caused PostgreSQL to use the modified value in the insert or update, rather than the value it originally intended to use. Instead of doing either of these things, the code returns undef, which tells PostgreSQL that it can simply proceed with modifying the table, having logged the modification.

Finally, the code calls CREATE TRIGGER to tie the newly created function to all of the events on the txn table.

Conclusion

Triggers are an immensely powerful and useful database tool, and I have only scratched the surface here. Being able to write them in Perl means that if you already know Perl, you don't have to learn a new language just so you can use them. So you get productive in much less time, and with greater comfort.

Part two of this series will look at the other new features of PL/Perl that are available in version 8.0 of PostgreSQL, and part three will look at what is coming in version 8.1 and beyond.

Andrew Dunstan works for a small consulting and software company in the Triangle area of North Carolina, and contributes to PostgreSQL as an enthusiastic hobbyist as well as a sometime professional user.


Return to the Databases DevCenter

Copyright © 2009 O'Reilly Media, Inc.