O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


The Future of Perl in PostgreSQL

by Andrew Dunstan
05/25/2006

My first article about PL/Perl discussed using triggers written in PL/Perl, and the second showed how to exchange data with the database and running database commands from inside PL/Perl. This final article in the series explores the improvements in PostgreSQL versions 8.0 and 8.1 and some of the possible improvements coming down the track.

Updates

PostgreSQL 8.0 and 8.1 have added and improved several features. Most of them make life easier for developers.

Communication Between Functions

Before version 8.0, each trusted function received its own Safe container. There was no effective way to share data between trusted functions, or between trusted and untrusted functions. Untrusted functions could share data with each other, as they shared a common global namespace, but having to use untrusted functions just to be able to share data was not very satisfactory. Starting with version 8.0, all trusted PL/Perl functions share a single Safe container, and there is a special shared hash called %_SHARED that is available to all PL/Perl functions, both trusted and untrusted, to allow them to share data. This can also be very useful if you need to save some state between function calls.

Suppose that you want to compute the cumulative sum of a certain column. You could do it with a set returning function as shown in the previous article, or you could do it far more simply using shared data and a pair of tiny PL/Perl functions:

    CREATE FUNCTION init_csum() 
    RETURNS void 
    LANGUAGE plperl AS $$

        $_SHARED{csum} = 0.0;

    $$;

    CREATE OR REPLACE FUNCTION csum(float) 
    RETURNS float
    LANGUAGE plperl AS $$

        my $observation = shift;
        return $_SHARED{csum} += $observation;

    $$;

Be sure to call init-csum before each statement where you call csum, to reset the accumulator.

To use this, make sure that the data is in the right order before it gets passed to the csum function. The easiest way to do this is with a subquery:

    SELECT init_csum();

    SELECT date, 
           to_char(rain,'99D99') as rainfall, 
           to_char(csum(rain),'999G999D99') as "Cumulative Rainfall"
    FROM (SELECT date, 
                 rain
          FROM daily_rain
          WHERE date BETWEEN '2004-01-01' and '2004-01-09'
          ORDER BY date
         ) as inner_query;

The result (which looks like it comes from a rainforest) looks like:

      date    | rainfall | Cumulative Rainfall 
  ------------+----------+---------------------
   2004-01-01 |   8.40   |        8.40
   2004-01-02 |   6.35   |       14.75
   2004-01-03 |   9.47   |       24.22
   2004-01-04 |   4.91   |       29.13
   2004-01-05 |   1.14   |       30.28
   2004-01-06 |   7.68   |       37.96
   2004-01-07 |   2.03   |       39.98
   2004-01-08 |    .25   |       40.23
   2004-01-09 |   4.01   |       44.24
  (9 rows)

UTF-8 Support

From version 8.1 of PostgreSQL, the PL/Perl glue code detects if the database uses UTF-8 encoding. If so, then it encodes strings for PL/Perl as UTF-8 so that Perl treats the strings the right way. The code detects at compile time whether the Perl version is modern enough to support UTF-8 encoded strings, and only takes this action if so.

There is nothing that the programmer should need to do to get UTF-8 support--if it's available, it just happens automatically.

use strict;

You might think that this is something that programmers could choose to do. However this is not so--calling use is a forbidden operation inside of the Safe containers. Yet as all good Perl programmers know, strict mode is very nearly the greatest thing since the birth of Larry Wall. PostgreSQL version 8.1 has some switches that let you turn strict mode on (and off, if you really must).

The best way to do this is globally in the database server config file postgresql.conf. Two simple lines there will turn on strict mode and you can not worry any more about it:

  custom_variable_classes = 'plperl'
  plperl.use_strict = true

The first line allows a prefix (or list of prefixes) for a configuration settings that the PostgreSQL core code doesn't know about. (This facility originated for use by PL/Java, but is usable by any loadable modules--it's all part of the PostgreSQL extension philosophy.) The second line sets a flag that the PL/Perl code checks.

To allow strict mode, but not to force it, specify the custom_variable_classes setting.

Another way to turn on strict mode in trusted code is to prepend some code to your function:

  BEGIN { strict->import(); }

Finally, if you need to turn on strict mode temporarily, and you have set the custom_variable_classes properly, use:

  SET  plperl.use_strict = 'true';

Doing this will not last past the current session, and will not affect already-compiled PL/Perl functions. I recommend to turn it on once in your postgresql.conf file and then live happily ever after.

Validator Function and Logged Non-Fatal Warnings

From version 8.1, PL/Perl has a validator function that is called when you create a function. Previously, PL/Perl did not compile the function when it was created, preferring to wait until its use. That was something of a nuisance, so PostgreSQL now compiles it using the validator when it is created. Of course, there are many Perl errors that you can't catch until the code runs, and this is no different. Nevertheless, it is it still useful--just as Perl's -c command line flag is useful.

Version 8.1 also helps developers by no longer dropping non-fatal warnings. They are now logged nicely at level NOTICE.

The Future

Disclaimer: this section reflects my personal views, so please don't take it as in any way an official position by anyone. Here are some ideas that a few people have discussed. In several cases, there have been patches for them which have either not been of acceptable quality, or arrived too late to make the last release. In other cases, they are just ideas that have floated around.

Prepared Queries

This is a very important item and I am very happy to say that we have added this feature for version 8.2 of PostgreSQL.

PL/Perl does not currently have any way to create or call parameterized queries. This makes creating queries a lot less pleasant than it might be, as the audit example in the first article in this series showed. Prepared queries are not only an excellent device to use for programming clarity and ease, but a major defensive tool to use against SQL injection attacks. Any time you see user input being interpolated into an SQL statement, you know you might be dangerous ground.

You can't always use prepared queries in every possible situation where you might need to interpolate data into SQL, so you always have to be a bit careful. In particular, you can't use them to avoid interpolating table names, because PostgreSQL needs that information in order to plan the query in the first place. Some query types can't be prepared at all. Nevertheless, they are extremely useful and will be an important addition to PL/Perl.

Also, prepared queries can be far more efficient--you only need to call the SQL parser once per preparation, rather than once per execution, so if you are using the same query over and over again this can be a big win. A typical situation for this is when using an INSERT statement to create thousands of new rows.

Support PostgreSQL Polymorphic Types

PostgreSQL has two polymorphic pseudo-types, anyelement and anyarray, that let you write functions that work more or less with any actual arguments. This can save lots of work--you might only have to write one function instead of a whole heap of them. PL/Perl does not currently support these polymorphic types, however. I hope it will in the next release.

OUT and IN OUT Parameters

Version 8.1 of PostgreSQL introduced support for OUT and IN OUT parameters, as set out in the SQL standard. (PL/SQL programmers will be very familiar with these.) However, PL/Perl does not currently implement this, so it is an item on our to-do list.

Named Parameters

PostgreSQL version 8.0 introduced support for named function parameters, but PL/Perl ignores the function name. It would be nice to get the glue code to create lexical aliases for the parameters using their supplied names (if any). There are several wrinkles in this, including differences in what PostgreSQL can use as an identifier and what Perl can use, and possible breakage of legacy applications if we are not careful. But I think it's worth trying, at least.

Complete Translation of Array and Record Types

In general, we have tried to ensure that a Postgres composite (that is, a record) type enters PL/Perl as a hashref and leaves as a hashref, and that an array enters PL/Perl as an arrayref and returns as an arrayref. However, there are a few places where we haven't yet completed this--mainly for arrays and composites nested inside of other composites. We need to complete this so that PL/Perl functions never need to construct or deconstruct array and composite literals.

Provide a Complete DBI Handle to Talk to the Database

We would like to provide a full DBI handle to the PL/Perl user, so that programmers who are familiar with using DBI don't have to learn another API. In fact, this has been done, but it has not been kept completely up to date, and has a license that conflicts with core PostgreSQL. We intend to implement this under PostgreSQL's BSD license and include it in PL/Perl, where we can maintain it as part of the core distribution.

Preload Library Modules and Specified Functions

Currently, there is no way to load database functions or library modules when the PL/Perl interpreter begins. That can mean a significant penalty when first calling a function or loading a library. In certain cases we could dramatically lower this penalty, especially if we could get the postmaster to preload the PL/Perl module and then all the functions and libraries, so as to pay this penalty only once instead of once per connection.

This is probably the most speculative of all of the ideas in this section, and still needs lots of thought and discussion.

Not on my List: Direct Function-to-Function Calls

From time to time people ask why PL/Perl functions cannot call one another directly. There are several technical reasons. First, they have no names. In fact, the only reference to them is in an anonymous hash that the glue code holds, but does not make available to the Perl code. Apart from that, there is no guarantee that the interpreter knows the function at all. When PostgreSQL calls a PL/Perl function, it first checks to see if it has compiled the function. If not, it compiles the function before calling it. We can't do that from inside of PL/Perl--for one thing, we would in any case have to call back to the database to get the source code, and trusted functions can't call eval anyway. This is not an easy problem to solve.

In fact, I am not convinced that it is worth solving. If we had nicely pre-loadable modules, especially if we could make them available safely to trusted code, then applications could move most of their features into modules and leave the directly visible functions as a thin veneer over the libraries. This might well reduce any requirement for direct call capability.

Perhaps someone will have an inspirational moment that shows them how to overcome all the difficulties. Until that happens, I for one think that this is one request that is not worth doing.

Summary

PL/Perl has come a long way in quite a short period. Twelve months or so ago, none of the features shown in this series of articles were present in the stable release of PL/Perl. I hope you will agree that those of us who have worked on it have made great progress. Special credit must go to CommandPrompt, Inc. for the trigger, composite, and initial SPI work; to Abhijit Menon-Sen for fixing our memory problems so well; and to David Fetter for documentation support.

I hope too that I have interested you with some of the directions we might be taking, and that you are motivated to try PL/Perl for yourself.

Andrew Dunstan works for a small consulting and software company in the Triangle area of North Carolina, and contributes to PostgreSQL as an enthusiastic hobbyist as well as a sometime professional user.


Return to the O'Reilly Databases

Copyright © 2009 O'Reilly Media, Inc.