Many programming languages support an enumerated type in some form. In C and C++, they are a kind of symbolic mapping to integers, but in other languages (such as Ada), they are more first-class citizens in the type system. Leaving enumerations out of the original Java language definition was one of the worst mistakes its creators made. They have finally remedied this, but it took a long time.
What is an enumerated type? A moderately formal definition might be: "a type that can take one of a given ordered list of string values." In some cases, the programmer might be able to specify the underlying representation of the values, providing a sort of map, but I believe that's extraneous to the essence of enums.
PostgreSQL, my favorite database, doesn't have any built-in support for enums. MySQL does support enums, although it has some aspects I don't like. It's easy enough to roll your own enum in PostgreSQL, though. One simple way is to apply a check constraint to a field:
create table average_temperature
(
year integer,
temp real,
season text (check season in ('spring', 'summer', 'autumn', 'winter'))
);
You can also specify the constraint as part of a domain definition, so you don't have to keep specifying it. A domain is more or less a pre-existing type with altered constraints. Here's an example of using one to define an enumerated type:
create domain season_type
as text
check (value in ('spring', 'summer', 'autumn', 'winter'));
create table average_temperature
(
year integer,
temp real,
season season_type
);
There are a couple of downsides to using this method, however. One is that Postgres will store the data as text, potentially consuming more space than necessary. It will also perform comparison operations using text comparison, which will be slower than using comparison based on a more efficient representation. More importantly, the database will, by default, sort and compare values based on their lexical values, rather than the enumeration order. Nevertheless, it works well in some circumstances.
Another technique is to use a lookup table with a foreign key relationship. The lookup table has two fields: code, which is the primary key, and value, which contains the enumeration list. You might construct the previous example for this method like this:
create table season_lookup
(
code int primary key,
value text not null
);
insert into season_lookup values(1,'spring');
insert into season_lookup values(2,'summer');
insert into season_lookup values(3,'autumn');
insert into season_lookup values(4,'winter');
create table average_temperature
(
year integer,
temp real,
season int references season_lookup(code)
);
This method does not suffer from the defects of the previous methods. The data will be stored efficiently and sorted and compared in the order you defined, but it is a lot harder to use. With the methods that use a check constraint, you can insert a value of the enumeration directly into the table. There's no lookup required--there's nothing to look up. Your client either needs to know the code values or look them up each time. To get the database to print the values back out, you need to join the two tables. It's a lot more cumbersome to use.
Note also that with this method of implementing enumeration types, if you try to change the ordering, or insert a value in the middle of the ordering, you will get a foreign key constraint violation if you have used the type at all. There is a way around that: add on update cascade to the references clause above. That would make adding a value in the middle or changing the order possible, but still very messy. If you think you might need to do that, lookup tables might not be the best way to go.
|
Related Reading Practical PostgreSQL |
|
With all this in mind I created EnumKit for PostgreSQL. It works with PostgreSQL 8.0 and later. It requires installation privileges for your database share/ and lib/ directories, too. Basically, it is an attempt to provide a facility for enums that have the ease of use of the check methods and the correctness of the lookup tables, plus high efficiency. The downside is that for each enum type you want to create, you have to do a little compilation and installation. Fortunately, it's not very hard, and you don't have to write any C--I've done that part for you.
Here's how the seasons example looks with EnumKit:
make TYPENAME=seasons ENUMS=' "spring", "summer", "autumn", "winter" '
make TYPENAME=seasons install
psql -f /path/to/your/pgsql/contrib/seasons-install.sql yourdbname
The quotes are important here, and if you muck them up you will get a compilation error. The values must be a comma-separated list of double-quoted strings. Make sure to enclose the whole list in single quotes.
Now you can just use the new type like in the domain example above, but PostgreSQL stores it efficiently (as a 16-bit integer), and sorts and compares it in the desired order.
create table average_temperature
(
year integer,
temp real,
season seasons
);
insert into average_temperature values(1999,56.7,'winter');
insert into average_temperature values(1999,78.3,'summer');
select *
from average_temperature
order by year, season;
One possible downside of using an EnumKit-created type is that modifying the type is at best dangerous. Once it is in place and you have used it, you really should not touch it. What if you need to modify it? The best way is to create a new type that incorporates the modifications, and then modify each table that uses the type. Luckily, modern versions of PostgreSQL have extremely good support for modifying table columns. Here is an example where I defined a column with an rgb enumeration type, and then changed it to a rainbow enumeration type. Note that although several values (notably blue and green) change position in the enumeration order in the change, their values remain.
andrew=# create table foo (i serial, c rgb);
NOTICE: CREATE TABLE will create implicit sequence "foo_i_seq" for serial
column "foo.i"
CREATE TABLE
andrew=# insert into foo (c) values ('blue');
INSERT 8711471 1
andrew=# insert into foo (c) values ('green');
INSERT 8711472 1
andrew=# insert into foo (c) values ('red');
INSERT 8711473 1
andrew=# select * from foo order by c;
i | c
---+-------
3 | red
2 | green
1 | blue
(3 rows)
andrew=# insert into foo (c) values ('yellow');
ERROR: invalid input value for enum: "yellow"
andrew=# alter table foo alter column c type rainbow using c::text;
ALTER TABLE
andrew=# select * from foo order by c;
i | c
---+-------
3 | red
2 | green
1 | blue
(3 rows)
andrew=# insert into foo (c) values ('yellow');
INSERT 8711477 1
andrew=# select * from foo order by c;
i | c
---+--------
3 | red
4 | yellow
2 | green
1 | blue
(4 rows)
Starting with PostgreSQL version 8.0, there is available a facility called PGXS that makes creating and installing PostgreSQL extensions a whole lot easier. The PGXS documentation describes it in some detail. Basically, you put a few entries into a makefile and PGXS will do most of the heavy lifting for you. I have used this in EnumKit, with a little twist to make it possible to define not one extension module but a family of them, one for each enumeration type.
This is just one example of a PostgreSQL extension module--probably one of the simplest you will see. PostgreSQL is extensible by design, and there are many examples of extension modules around, including functions, types, and server-side programming languages. The extensibility of PostgreSQL is one of its great strengths and makes it a joy to work with.
Essentially, EnumKit makes a copy of generic.c and generic-install.sql with the type name and values plugged in, and then compiles the C file into a loadable module. The SQL file is necessary to set up the type in a particular database.
The next step is really to remove the requirement for an extension module. Ideally, you should be able to do all this at the SQL level without having to mess with makefiles, privileges, etc. Any user should be able to define an enumeration type and use it as effectively and easily as they can with types created by EnumKit. Thus, EnumKit is really the result of some experimentation and a bit of a stop-gap.
How to do enums right (or even whether to do them at all) has been the subject of ongoing debate in the PostgreSQL community. As usual, this debate has taken place in the open on the PostgreSQL Hackers mailing list (available from the PostgreSQL mailing list archives).
However, there seems to be general agreement that it is worth doing. I intend to work on it for the 8.2 release. In the meantime, the techniques presented in this article should be able to satisfy most needs.
Andrew Dunstan works for a small consulting and software company in the Triangle area of North Carolina, and contributes to PostgreSQL as an enthusiastic hobbyist as well as a sometime professional user.
Return to the Databases DevCenter
Copyright © 2009 O'Reilly Media, Inc.