Batch Updates with PL/pgSQL
Pages: 1, 2, 3, 4, 5
Improved Tag-Adding Function
Just as a reminder, here's how the original entry_coll_tag_add() function looked:
1 CREATE OR REPLACE FUNCTION entry_coll_tag_add (
2 obj_id integer,
3 coll_ids integer[]
4 ) RETURNS VOID AS $$
5 DECLARE
6 last_ord smallint;
7 got_ord boolean;
8 iord integer := 1;
9 BEGIN
10 PERFORM true FROM entry WHERE id = obj_id FOR UPDATE;
11
12 SELECT INTO last_ord COALESCE(max(tag_order), 0)
13 FROM entry_coll_tag
14 WHERE entry_id = obj_id;
15
16 FOR iloop IN 1..array_upper(coll_ids, 1) LOOP
17 IF coll_ids[iloop] IS NULL THEN
18 CONTINUE;
19 END IF;
20
21 SELECT INTO got_ord true
22 FROM entry_coll_tag
23 WHERE entry_id = obj_id
24 AND tag_id = coll_ids[iloop];
25
26 IF got_ord IS NULL THEN
27 INSERT INTO entry_coll_tag (entry_id, tag_id, tag_order)
28 VALUES (obj_id, coll_ids[iloop], last_ord + iord);
29 iord := iord + 1;
30 END IF;
31 END LOOP;
32 END;
33 $$ LANGUAGE plpgsql;
The bit that I want to eliminate is the loop. It's going to be a bit tricky, because it executes a SELECT to see if a record already exists, and only does the insert if it doesn't already exist. That won't work for the batch insert, but because it uses an expression to batch it, a WHERE clause can limit the array values that are actually inserted. Here's how that change looks:
1 CREATE OR REPLACE FUNCTION entry_coll_tag_add (
2 obj_id integer,
3 coll_ids integer[]
4 ) RETURNS VOID AS $$
5 DECLARE
6 last_ord smallint;
7 BEGIN
8 PERFORM true FROM entry WHERE id = obj_id FOR UPDATE;
9
10 SELECT INTO last_ord COALESCE(MAX(ord), 0)
11 FROM entry_coll_tag
12 WHERE entry_id = obj_id;
13
14 INSERT INTO entry_coll_tag (entry_id, tag_id, ord )
15 SELECT obj_id, coll_ids[gs.ser], gs.ser + last_ord
16 FROM generate_series(1, array_upper(coll_ids, 1)) AS gs(ser)
17 WHERE coll_ids[gs.ser] NOT IN (
18 SELECT tag_id FROM entry_coll_tag
19 WHERE entry_id = obj_id
20 );
21
22 END;
23 $$ LANGUAGE plpgsql;
The first difference to note is the reduced number of declared variables.
Using generate_series() to generate the order numbers, and not
using a SELECT statement, makes the got_ord and iord variables unnecessary (although there's more on iord in a bit).
The first two statements in the body of the function are the same as before, namely the PERFORM statement, which locks the entry record, and the SELECT statement, which determines the highest existing value for the entry ID's tag_order column stored in obj_id. Rather than the loop, there is a single INSERT statement. Taking a closer look...
- Line 14 starts the
INSERTstatement as usual. - Line 15 starts the
SELECTstatement that makes up the expression part of theINSERTstatement. It uses the obj_id variable for the entry_id column,coll_ids[gs.ser]for the tag_id column, andgs.ser + last_ordfor the tag_order column. Where did gen.ser come from? I'm so glad you asked! - Line 16 uses
generate_series()to create a series corresponding to the number of items in the coll_ids array, usingarray_upper()to determine that number. Because PostgreSQL allows an expression in aFROMclause, you can treat the values returned bygenerate_series()as a table with a single column. TheAS gen(ser)syntax gives a convenient name for thetable
(gen) and its lone column (ser). - Lines 17-20 contain the
WHEREclause for theSELECTstatement. It uses a subquery to prevent inserting any tag IDs associated with the entry ID in obj_id. Unlike the original version, this function doesn't have to check forNULLvalues. ThisWHEREexpression ensures thatNULLwill never beINany list of values.
That's it. Really! No matter how many tag IDs you pass in the coll_ids array, it will execute no more than three queries total. Given that the previous version of the function would execute four queries when coll_ids had but a single ID (potentially adding two more queries for every additional ID), it's clear that this version, with its constant number of queries, is a big win. The fact that it actually takes less code to write doesn't hurt, either.
That said, this function does not behave identically to the original; it may not insert the new IDs with perfectly sequential values for the tag_order column. iord served this purpose in the previous version of the function, but there's no place for it in the new version. For example, if you had called the function with:
SELECT entry_coll_tag_add(1, '{5,13,65,12}');
... where entry ID 1 already had tag IDs 13 and 65 associated with it. In such a case, the result might look something like:
try=% SELECT * FROM entry_coll_tag WHERE entry_id = 1 ORDER BY tag_order;
entry_id | tag_id | tag_order
----------+--------+-----------
1 | 13 | 1
1 | 65 | 2
1 | 5 | 3
1 | 12 | 6
(4 rows)
Note how the tag_order jumps from 3 to 6. This is because IDs 13 and 65 would have been ordered 4 and 5, but because they weren't inserted, neither were their order numbers. In truth, this situation existed already, thanks to the entry_coll_tag_del() function created in the previous article, since that function deletes certain tag IDs without resetting the tag_order of any remaining rows. But in most applications this shouldn't matter, as long as you use tag_order purely for ordering, rather than for ordinal positional values. It does not represent array indexes, only ordering.







