O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Listen Print Discuss Subscribe to Databases Subscribe to Newsletters

Using Regular Expressions in PostgreSQL

by Robert Bernier
02/02/2006

You know what I love about open source? The tricks you learn in one activity can effectively transfer over to another one. It's like using the skills learned from mathematics on the dance floor. Take regular expressions, for example. Did you know that becoming an expert in using regular expressions in Perl also makes you a PostgreSQL power user?

About Regular Expressions

Regular expressions, for the uninitiated, are a set of key combinations that allow people to search for character matches, thus enabling a large variety of control in a text environment. Regular expressions are a staple of the Unix-like operating systems, although they are virtually unknown in one notable exception (go ahead and guess). They are a common feature of many applications, including text editors, command-line utilities, and programming languages to search and manipulate bodies of text based on certain patterns.

My own experiences with regular expressions began with vi (actually, Vim).

PostgreSQL Pattern Matching

PostgreSQL supports many forms of pattern matching: the traditional SQL LIKE operator, the more recent SIMILAR TO operator (added in SQL 1999), and the Posix-style regular expressions (also implemented in the pattern-matching function substring).

More than likely a database administrator will have cut his teeth by first learning some form of programming or, better yet, systems administration on his chosen platform before playing with a database engine. In other words, he likely has already seen regular expressions long before writing his first SQL statement.

What's the big advantage using regular expressions in PostgreSQL, besides the fact that you already know about regular expressions? Two words: power and speed.

Regular expressions offer the DBA the ability to compose queries using highly complex criteria in a compact and sophisticated manner. Better, they can make your statements execute faster. Regular expressions are simpler than the longer and, sometimes, more resource-intensive operators LIKE and SIMILAR TO. Many SQL statements also need to couple traditional operators with user-defined functions to accomplish the same query as a single statement using regular expressions.

Using Regexes in PostgreSQL

To use regexes in PostgreSQL, you must have sufficient privileges in your installation to create and populate a table. Check the postgresql.conf configuration file; you need either advanced or extended permissions.

The rest of this article presents examples based upon two single-column tables. For the sake of brevity, I've omitted the results. Therefore review the query results with a replicated session on your machine.

First, create a new database containing a table:

CREATE DATABASE regex;
CREATE TABLE myrecords(record text);

From the psql command line, populate the myrecords table. (Note: you must include a backslash followed by a period on a separate line to end date input for the COPY command.)

COPY myrecords (record) FROM stdin;
a
ab
abc
123abc
132abc
123ABC
abc123
4567
5678
6789
\.

Matching single characters

The simplest query uses the tilde operator, ~, followed by literal quoted characters. These examples return all records that contain specific character(s) as defined by the criteria:

SELECT record FROM myrecords WHERE record ~ '1';
SELECT record FROM myrecords WHERE record ~ 'a';
SELECT record FROM myrecords WHERE record ~ 'A';
SELECT record FROM myrecords WHERE record ~ '3a';

The addition of the asterisk, *, after the tilde renders the query search case-insensitive:

SELECT record FROM myrecords WHERE record ~* 'a';
SELECT record FROM myrecords WHERE record ~* '3a';

The exclamation mark (or "bang"), !, modifies the tilde operator and excludes strings containing the character(s):

SELECT record FROM myrecords WHERE record !~ '1';

You can combine case insensitivity and exclusion:

SELECT record FROM myrecords WHERE record !~* 'c';

The caret, ^, known as an anchor, returns records with the specific character(s) located at the beginning of the string:

SELECT record FROM myrecords WHERE record ~ '^1';
SELECT record FROM myrecords WHERE record ~ '^a';
SELECT record FROM myrecords WHERE record ~* '^a';

The dollar sign, $, is also an anchor but at the end of the string:

SELECT record FROM myrecords WHERE record ~ 'c$';
SELECT record FROM myrecords WHERE record ~ 'bc$';
SELECT record FROM myrecords WHERE record ~* 'bc$';

Pages: 1, 2, 3

Next Pagearrow




Tagged Articles

Be the first to post this article to del.icio.us

Sponsored Resources

  • Inside Lightroom

Related to this Article

Understanding Oracle Clinical Understanding Oracle Clinical
by Joan M. Johnson
June 2009
$9.99 USD

New Features in Oracle 9i New Features in Oracle 9i
by Howard J. Rogers
June 2009
$5.95 USD

Advertisement
O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com