Enumerated Fields in PostgreSQL
by Andrew Dunstan01/06/2006
Many programming languages support an enumerated type in some form. In C and C++, they are a kind of symbolic mapping to integers, but in other languages (such as Ada), they are more first-class citizens in the type system. Leaving enumerations out of the original Java language definition was one of the worst mistakes its creators made. They have finally remedied this, but it took a long time.
What is an enumerated type? A moderately formal definition might be: "a type that can take one of a given ordered list of string values." In some cases, the programmer might be able to specify the underlying representation of the values, providing a sort of map, but I believe that's extraneous to the essence of enums.
Available Methods Today
PostgreSQL, my favorite database, doesn't have any built-in support for enums. MySQL does support enums, although it has some aspects I don't like. It's easy enough to roll your own enum in PostgreSQL, though. One simple way is to apply a check constraint to a field:
create table average_temperature
(
year integer,
temp real,
season text (check season in ('spring', 'summer', 'autumn', 'winter'))
);
You can also specify the constraint as part of a domain definition, so you don't have to keep specifying it. A domain is more or less a pre-existing type with altered constraints. Here's an example of using one to define an enumerated type:
create domain season_type
as text
check (value in ('spring', 'summer', 'autumn', 'winter'));
create table average_temperature
(
year integer,
temp real,
season season_type
);
There are a couple of downsides to using this method, however. One is that Postgres will store the data as text, potentially consuming more space than necessary. It will also perform comparison operations using text comparison, which will be slower than using comparison based on a more efficient representation. More importantly, the database will, by default, sort and compare values based on their lexical values, rather than the enumeration order. Nevertheless, it works well in some circumstances.
Another technique is to use a lookup table with a foreign key relationship. The lookup table has two fields: code, which is the primary key, and value, which contains the enumeration list. You might construct the previous example for this method like this:
create table season_lookup
(
code int primary key,
value text not null
);
insert into season_lookup values(1,'spring');
insert into season_lookup values(2,'summer');
insert into season_lookup values(3,'autumn');
insert into season_lookup values(4,'winter');
create table average_temperature
(
year integer,
temp real,
season int references season_lookup(code)
);
This method does not suffer from the defects of the previous methods. The data will be stored efficiently and sorted and compared in the order you defined, but it is a lot harder to use. With the methods that use a check constraint, you can insert a value of the enumeration directly into the table. There's no lookup required--there's nothing to look up. Your client either needs to know the code values or look them up each time. To get the database to print the values back out, you need to join the two tables. It's a lot more cumbersome to use.
Note also that with this method of implementing enumeration types, if you try to change the ordering, or insert a value in the middle of the ordering, you will get a foreign key constraint violation if you have used the type at all. There is a way around that: add on update cascade to the references clause above. That would make adding a value in the middle or changing the order possible, but still very messy. If you think you might need to do that, lookup tables might not be the best way to go.
Pages: 1, 2 |




