Using Regular Expressions in PostgreSQLby Robert Bernier
You know what I love about open source? The tricks you learn in one activity can effectively transfer over to another one. It's like using the skills learned from mathematics on the dance floor. Take regular expressions, for example. Did you know that becoming an expert in using regular expressions in Perl also makes you a PostgreSQL power user?
About Regular Expressions
Regular expressions, for the uninitiated, are a set of key combinations that allow people to search for character matches, thus enabling a large variety of control in a text environment. Regular expressions are a staple of the Unix-like operating systems, although they are virtually unknown in one notable exception (go ahead and guess). They are a common feature of many applications, including text editors, command-line utilities, and programming languages to search and manipulate bodies of text based on certain patterns.
My own experiences with regular expressions began with vi (actually, Vim).
PostgreSQL Pattern Matching
PostgreSQL supports many forms of pattern matching: the traditional SQL
LIKE operator, the more recent
SIMILAR TO operator (added in SQL 1999), and the Posix-style regular expressions (also implemented in the pattern-matching function
More than likely a database administrator will have cut his teeth by first learning some form of programming or, better yet, systems administration on his chosen platform before playing with a database engine. In other words, he likely has already seen regular expressions long before writing his first SQL statement.
What's the big advantage using regular expressions in PostgreSQL, besides the fact that you already know about regular expressions? Two words: power and speed.
Regular expressions offer the DBA the ability to compose queries using highly complex criteria in a compact and sophisticated manner. Better, they can make your statements execute faster. Regular expressions are simpler than the longer and, sometimes, more resource-intensive operators
SIMILAR TO. Many SQL statements also need to couple traditional operators with user-defined functions to accomplish the same query as a single statement using regular expressions.
Using Regexes in PostgreSQL
To use regexes in PostgreSQL, you must have sufficient privileges in your installation to create and populate a table. Check the postgresql.conf configuration file; you need either advanced or extended permissions.
The rest of this article presents examples based upon two single-column tables. For the sake of brevity, I've omitted the results. Therefore review the query results with a replicated session on your machine.
First, create a new database containing a table:
CREATE DATABASE regex; CREATE TABLE myrecords(record text);
psql command line, populate the
myrecords table. (Note: you must include a backslash followed by a period on a separate line to end date input for the
COPY myrecords (record) FROM stdin; a ab abc 123abc 132abc 123ABC abc123 4567 5678 6789 \.
Matching single characters
The simplest query uses the tilde operator,
~, followed by literal quoted characters. These examples return all records that contain specific character(s) as defined by the criteria:
SELECT record FROM myrecords WHERE record ~ '1'; SELECT record FROM myrecords WHERE record ~ 'a'; SELECT record FROM myrecords WHERE record ~ 'A'; SELECT record FROM myrecords WHERE record ~ '3a';
The addition of the asterisk,
*, after the tilde renders the query search case-insensitive:
SELECT record FROM myrecords WHERE record ~* 'a'; SELECT record FROM myrecords WHERE record ~* '3a';
The exclamation mark (or "bang"),
!, modifies the tilde operator and excludes strings containing the character(s):
SELECT record FROM myrecords WHERE record !~ '1';
You can combine case insensitivity and exclusion:
SELECT record FROM myrecords WHERE record !~* 'c';
^, known as an anchor, returns records with the specific character(s) located at the beginning of the string:
SELECT record FROM myrecords WHERE record ~ '^1'; SELECT record FROM myrecords WHERE record ~ '^a'; SELECT record FROM myrecords WHERE record ~* '^a';
The dollar sign,
$, is also an anchor but at the end of the string:
SELECT record FROM myrecords WHERE record ~ 'c$'; SELECT record FROM myrecords WHERE record ~ 'bc$'; SELECT record FROM myrecords WHERE record ~* 'bc$';