Batch Updates with PL/pgSQL
Pages: 1, 2, 3, 4, 5
Batch-Update Syntax
PostgreSQL supports inserting multiple rows at once using its special INSERT INTO expression syntax, which essentially allows you to select a series of rows and then insert them elsewhere, all in a single statement. For example, suppose that you want to copy all of the rows in table foo into table bar (that is, all rows that aren't already in table bar). Here's how to do it in a single INSERT statement:
INSERT INTO bar (a, b, c)
SELECT d, e, f FROM foo
WHERE foo.d not in (SELECT a FROM bar);
That's it. By using a SELECT statement in the expression part of the INSERT statement, you can insert multiple rows in a single statement, and even limit the rows through the use of the WHERE clause. Of course, I'm sure that you don't duplicate data like this unless you're deprecating table foo in favor of table bar, right? This syntax does have its uses, though.
As for batch updates, you likely already know that you can update multiple rows at once using a WHERE clause in an UPDATE statement. As with INSERT, UDPATE can also update a series of rows from an expression, generally a SELECT statement. This is possible because, in PostgreSQL, a FROM clause can be an expression, and UPDATE supports the FROM clause. For example, if you want to update all of the rows in a bar table with values from the foo table, where bar.a has the same value as foo.d, you can write:
UPDATE bar
SET b = foo.e, c = foo.f
FROM (SELECT d, e, f FROM foo) AS foo
WHERE a = foo.d;
Pretty simple, right? Not only can you use an expression in the FROM clause, but you can use AS to name it so that the remaining SQL code can treat it exactly like a table. Now how do you construct such SELECT statements within your functions?
From Arrays to Sets
To take advantage of bulk updates in collection functions, you need some way to create a SELECT statement that generates a row for every value in the array passed to the function. Then you can use that SELECT statement as part of the INSERT and UPDATE statements to update the collections with the same number of queries, regardless of how many IDs are in the array.
When I initially tackled this problem, my first attempt was to create a function that converts an array into a set. Fortunately, this turned out to be pretty easy, given how set-returning functions work:
CREATE OR REPLACE FUNCTION array_to_set(
arr anyarray
) RETURNS SETOF anyelement AS $$
BEGIN
FOR idx IN array_lower(arr, 1)..array_upper(arr, 1) LOOP
RETURN NEXT arr[idx];
END LOOP;
END;
$$ LANGUAGE plpgsql;
PostgreSQL provides several pseudo-types, which are convenient for contexts that need to handle any number of different data types. Because this example code doesn't really care about the data type of the array it converts, I have declared that array_to_set() can take any kind of array--denoted as anyarray--as an argument, and that it returns a SETOF anyelement. Indeed, this function works pretty well:
try=% select * from array_to_set( ARRAY[3,4,10,56,2] );
array_to_set
--------------
3
4
10
56
2
(5 rows)
Then I realized that I needed not only each value in the array, but also its position in the array so that I could populate the ordering column in the collection table. That would be trickier to write, given the limited return values of a function. Fortunately, there is another approach.
Series Generation
PostgreSQL comes with a very nice set-returning function called generate_series(). This function takes three arguments: a beginning number, an ending number, and an optional step
number, and returns a set of numbers from the beginning number to the ending number, each incremented by the step number. It's easiest to understand when you see it in action:
try=% select * from generate_series(2, 4);
generate_series
-----------------
2
3
4
(3 rows)
try=% select * from generate_series(1, 10, 2);
generate_series
-----------------
1
3
5
7
9
(5 rows)
Nice, eh? It's a powerful solution to the problem of creating a SELECT statement that generates rows for each number in the series and each value in an array--without having to call the array_to_set() function at all:
try=% SELECT gs.ser, coll.ids[gs.ser] as id
try-% FROM (SELECT ARRAY[ 2,4,6,7,8 ]) AS coll(ids),
try-% generate_series(1, 5) AS gs(ser);
ser | id
-----+----
1 | 2
2 | 4
3 | 6
4 | 7
5 | 8
(5 rows)
Because PostgreSQL FROM clauses may be SQL expressions, you can call generate_series() in the FROM clause and use its values to iterate over the array for each row. The key to being able to reference the series is the AS gs(ser) clause, which gives the call to generate_series() the table name gs and its lone column the name ser. Then simply reference the appropriate value in each row as gs.ser.
Thus, using the generate_series() set-returning function, you can get at each element in the array for each row, while also outputting the series number, all in a single query. This is exactly what you need in order to use batch updates in the collection-management functions.







