Many regard PostgreSQL as the state-of-the-art open source database application. Its roots go back to 1977 at UC Berkeley, and the story of how it reached its current status mirrors some of the best success stories we've heard before in open source.
If you're not familiar with PostgreSQL, this will give you a nice foundation toward better understanding this truly remarkable database.
PostgreSQL's ancestor was Ingres, developed at the University of California at Berkeley from 1977 to 1985. The Ingres code was taken and enhanced by the Relational Technologies/Ingres Corporation, which produced one of the first commercially successful relational database servers. Also at Berkeley, Michael Stonebraker led a team during the 1986 to 1994 period to develop an object-relational database server called Postgres. The Postgres code was taken by Illustra and developed into a commercial product. Two Berkeley graduate students, Jolly Chen and Andrew Yu, added SQL capabilities to Postgres during 1994 and 95 and called it Postgres95. They left Berkeley, but Chen continued maintaining Postgres95, which had an active mailing list.
In the summer of 1996, it became clear that the demand for an open source SQL database server was great, and a team was formed to continue development. Marc G. Fournier, in Toronto, offered to host the mailing list and provide a server to host the source tree. One thousand mailing list subscribers were moved to the new list. A server was configured, giving a few people login accounts to apply patches to the source code using cvs.
By this point Jolly Chen had stated, "This project needs a few people with lots of time, not many people with a little time." With 250,000 lines of C code, it was easy to understand what he meant. In the early days, there were four people heavily involved: Marc Fournier in Canada, Thomas Lockhart in Pasadena, California, Vadim Mikheev in Krasnoyarsk, Russia, and me in Philadelphia, Pennsylvania. We all had full-time jobs, so we did this in our spare time. Calling this a challenge was an understatement.
Our first goal was to scour the old mailing list, evaluating patches that had been posted to fix various problems. The system was quite fragile then and not easily understood. During the first six months of development, there was fear that a single patch would break the system, and we would be unable to correct the problem. Many bug reports had us scratching our heads, trying to figure out not only what was wrong, but how the system even performed many functions.
We inherited a huge installed base. A typical bug report was, "When I do this, it crashes the database." We had a whole list of them. It became clear that some organization was needed. Most bug reports required significant research to fix, and many were duplicates, so our TODO list reported every buggy SQL query. It helped us identify our bugs, and made users aware of them too, cutting down on duplicate bug reports.
We had many eager developers, but the learning curve in understanding how the back-end worked was significant. Many developers got involved at the edges of the source code, like language interfaces or database tools, where things were easier to understand. Other developers focused on specific problem queries, trying to locate the source of the bug. It was amazing to see that many bugs were fixed with just one line of C code. Postgres had evolved in an academic environment and had not been exposed to the full spectrum of real-world queries. During that period, there was talk of adding features, but the instability of the system made bug fixing our major focus.
In late 1996, we changed the name from Postgres95 to PostgreSQL. It is a mouthful, but honors the Berkeley name and SQL capabilities. We started distributing the source code using remote cvs, which allowed people to keep up-to-date copies of the development tree without downloading an entire set of files every day.
Releases occurred every 3 to 5 months. This time frame consisted of 2-3 months of development, a month of beta testing, a major release, and a few weeks to issue subreleases to correct serious bugs. We were never tempted to follow a more aggressive schedule with more releases. A database server is not like a word processor or a game, where you can easily restart it if there is a problem. Databases are multiuser, and they lock user data inside the database, so we had to make our software as reliable as possible.
Development of source code of this scale and complexity is not for the novice. We initially had trouble getting developers interested in a project with such a steep learning curve. However, our civilized atmosphere, and our improved reliability and performance, finally helped attract the experienced talent we needed.
Getting our developers the knowledge they needed to assist with PostgreSQL was clearly a priority. We had a TODO list that outlined what needed to be done, but with 250,000 lines of code, taking on any to-do item was a major project. We realized developer education would pay major benefits in helping people get started. We wrote a detailed flowchart of the back-end modules. We wrote a developers' FAQ to answer some of the common questions of PostgreSQL developers. With this, developers became more productive at fixing bugs and adding features.
The source code we inherited from Berkeley was very modular. However, most Berkeley coders used PostgreSQL as a test bed for research projects. Improving existing code was not a priority. Their coding styles were also quite varied.
We wrote a tool to reformat the entire source tree in a consistent manner. We wrote a script to find functions that could be marked as static, or unused functions that could be removed completely. These are run just before each release. A release checklist reminds us of the items to be changed for each release.
As we gained knowledge of the code, we were able to perform more complicated fixes and feature additions. We redesigned poorly structured code. We moved into a mode where each release had major new features instead of just bug fixes. We improved SQL conformance, added sub-selects, improved locking, and added missing SQL functionality. A company formed to offer telephone support.
The Usenet discussion group archives started touting us. In the previous year, we searched for PostgreSQL, and found many people were recommending other databases, even though we were addressing user concerns as rapidly as possible. One year later, many people were recommending us to users who needed transaction support, complex queries, commercial-grade SQL support, complex data types, and reliability. This clearly portrayed our strengths. Other databases were recommended when speed was the overriding concern. Red Hat's shipment of PostgreSQL as part of their Linux distribution quickly expanded our user base.
Every release is now a major improvement over the last. Our global development team now has mastery of the source code we inherited from Berkeley. Finally, every module is understood by at least one development team member. We are now easily adding major features, thanks to the increasing size and experience of our world-wide development team.
Bruce Momjian is writing a book about PostgreSQL for Addison-Wesley, tentatively titled, PostgreSQL: Introduction and Concepts.
Discuss this article in the O'Reilly Network Forum.
Return to the O'Reilly Network Hub.
Copyright © 2009 O'Reilly Media, Inc.