Batch Updates with PL/pgSQL
Pages: 1, 2, 3, 4, 5
Improved Tag-Setting Function
What of entry_coll_tag_set()? Remember the original:
1 CREATE OR REPLACE FUNCTION entry_coll_tag_set (
2 obj_id integer,
3 coll_ids integer[]
4 ) RETURNS VOID AS $$
5 BEGIN
6 PERFORM true FROM entry WHERE id = obj_id FOR UPDATE;
7
8 UPDATE entry_coll_tag
9 SET tag_order = -tag_order
10 WHERE entry_id = obj_id
11
12 FOR iloop IN 1..array_upper(coll_ids, 1) LOOP
13 IF coll_ids[iloop] IS NULL THEN
14 CONTINUE;
15 END IF;
16
17 UPDATE entry_coll_tag
18 SET tag_order = iloop
19 WHERE entry_id = obj_id
20 AND tag_id = coll_ids[iloop];
21
22 IF FOUND IS false THEN
23 INSERT INTO entry_coll_tag (entry_id, tag_id, tag_order)
24 VALUES (obj_id, coll_ids[iloop], iloop);
25 END IF;
26 END LOOP;
27
28 DELETE FROM entry_coll_tag
29 WHERE entry_id = obj_id AND tag_order < 0;
30 END;
31 $$ LANGUAGE plpgsql;
The number of queries run by this function is even more variable than the number run by entry_coll_tag_add(), as each ID in coll_ids will trigger the execution of either one or two queries: one if the UPDATE statement updates a row, and two if it doesn't. Here is the batch-updating version:
1 CREATE OR REPLACE FUNCTION entry_coll_tag_set (
2 obj_id integer,
3 coll_ids integer[]
4 ) RETURNS VOID AS $$
5 BEGIN
6 PERFORM true FROM entry WHERE id = obj_id FOR UPDATE;
7
8 UPDATE entry_coll_tag
9 SET ord = -ord
10 WHERE entry_id = obj_id;
11
12 IF FOUND IS false THEN
13 INSERT INTO entry_coll_tag (entry_id, tag_id, ord)
14 SELECT obj_id, coll_ids[gs.ser], gs.ser
15 FROM generate_series(1, array_upper(coll_ids, 1))
16 AS gs(ser)
17 WHERE coll_ids[gs.ser] IS NOT NULL;
18 ELSE
19 UPDATE entry_coll_tag SET ord = ser
20 FROM (
21 SELECT gs.ser, coll_ids[gs.ser] as move_tag
22 FROM generate_series(1, array_upper(coll_ids, 1)) AS gs(ser)
23 WHERE coll_ids[gs.ser] IS NOT NULL
24 ) AS expansion
25 WHERE move_tag = entry_coll_tag.tag_id
26 AND entry_id = obj_id;
27
28 INSERT INTO entry_coll_tag (entry_id, tag_id, ord )
29 SELECT obj_id, coll_ids[gs.ser], gs.ser
30 FROM generate_series(1, array_upper(coll_ids, 1)) AS gs(ser)
31 WHERE coll_ids[gs.ser] NOT IN (
32 SELECT tag_id FROM entry_coll_tag
33 WHERE entry_id = obj_id
34 );
35
36 DELETE FROM entry_coll_tag
37 WHERE entry_id = obj_id AND ord < 0;
38 END IF;
39 END;
40 $$ LANGUAGE plpgsql;
I admit that, at first glance, this looks more complicated than the original, but that's only because I added an optimization that I could have included in the original, had I noticed it before. Here is the function piece-by-piece.
- Once again, the first two statements in the function body are the same as before. The function retains the entry row locking
PERFORMstatement and theUPDATEstatement that negates the values for the tag_order column for all records associated with obj_id. - Line 12 is where the changes begin, and also where I added the optimization missing in the original. If the globally available PL/pgSQL Boolean variable
FOUNDisfalse, it means that theUPDATEstatement in lines 8-10 updated no records, and there is no need for any fancy tricks to update previously existing rows, because there are none. - Lines 13-17 use
generate_series()in a batchINSERTstatement to simply insert all of the tag IDs in coll_ids. This is about as clear as it gets, with the gs.sercolumn
both accessing individual array values and supplying the corresponding value for tag_order. TheWHERE coll_ids[gs.ser] IS NOT NULLexpression skips over anyNULLvalues that might be in coll_ids. - Line 18 starts the block that handles the case where the
UPDATEstatement at lines 8-10 did update one or more rows. The upshot is that any tag IDs in coll_ids that are already in the table must be updated to their appropriate tag_order values, those that don't exist must be inserted, and any that are left over must be deleted. - Accordingly, lines 19-26 use a batch
UPDATEto update any existing records with new tag_order values. It usesgenerate_series()in theFROMclause to create the appropriate rows to be updated, with aWHEREclause in the expression that creates the gs table to skipNULLvalues and aWHEREclause for theUPDATEstatement itself to map theUPDATEto the appropriate rows, where the entry ID and tag ID match. - Rows 28-34 then insert the new tags into the collection table, once again using a batch
INSERTstatement withgenerate_series()to create the new rows to be added. As in the revampedentry_coll_tag_add()function, theWHEREclause skipsNULLs and prevents inserting existing tags, which would conflict with the unique index. - Finally, rows 36-37 delete any leftover associations, just as before, by removing those with a tag_order less than zero.
Phew! That was a lot to cover, but the result is that this function now also runs a constant number of queries for each execution. Or nearly constant. If there are no existing rows in the collection table, it will run only three queries. Otherwise, it will run five. In neither case will it ever run any more than five.







