We depend on databases for everything, and nearly every useful application talks to some sort of database. Anybody who has written software knows that communicating with databases can be one of the most nettlesome tasks; not because databases are inherently difficult to work with, but because they are difficult to install and configure.
Case in point: I am working on a Python application that requires a smallish back-end database to store documents and user information. My design objective for this project is to make installation as simple as possible, with no dependencies upon external packages or extensions (that is, pure Python).
I thought this would be easy enough, as Python is a great language to work with. I quickly discovered that getting it to work with external databases apart from Berkeley DB (via
anydbm) is not so simple. Once you get a particular package, such as PySQLite, installed and configured, it's easy enough. However, try explaining to an end user how to install external packages if they are not already familiar with Python (most aren't). I've spent far too much time fighting with database issues, versus building and debugging my application.
In theory, Python, like most other languages, provides a standard database API. This specification provides high-level access to common database functions such as paging through a recordset. The problem is that each database requires a different extension to Python. Some of these extensions, like PySQLite, are relatively easy to install. Others are a nightmare. This is true of any language, as each database engine has a different installation procedure, as well as a large number of configuration options, any one of which can break a client/server application.
What I don't understand is why, in 2005, we are still using proprietary and unnecessarily complicated wire protocols to talk to database servers when we could be using something much simpler, something akin to RSS. The vast majority of database applications perform a fairly small set of tasks, such as submitting a SQL query, paging through a recordset, and adding a record. Why do we need something like ODBC to do this? The answer is we don't, for many applications.
Most of what we use databases for, we can accomplish with XML-RPC and SQL. So why not define a simple XML-RPC interface (Really Simple Database Protocol, or RSDP), that allows applications to communicate with SQL databases without the need for troublesome external packages and middleware? People are already doing this with in-house projects, so all that is needed is for a couple of vendors to build simple XML-RPC front ends to existing systems.
RSDP, or whatever you want to call it, would not replace existing DB interfaces, nor would it be appropriate for every situation (I'll detail the limitations of this approach later in the article). It would provide developers with a way to prototype and build database driven-applications more independently of back-end systems. The software industry is full of examples of simple "duct tape" solutions, such as RSS, that have become de facto standards despite their limitations.
What might this XML-RPC interface look like? Just one XML-RPC method will do the trick for a basic implementation. Try this on for size:
rsdp(dbname,username,pw,sql) rsdp("customers.sql","foo","bar","SELECT * FROM customers WHERE lastlogin >= '20041231' ORDER BY lastname;")
The client application simply invokes this method on the database server, which checks that the user has permission to perform this operation, opens the desired database, performs the SQL query, and responds with an error code or XML recordset. That's all there is to it, and this will be easy to implement in any XML-RPC-aware programming language, which is to say pretty much all of them.
This is obviously a simplified example, but need not be expanded upon very much to provide most of the functionality that is required for most applications. Simply providing a way to submit SQL queries via XML-RPC will enable developers to use whatever SQL features are recognized by the database server, including
An interface like RSDP will allow developers to write database applications that are fairly platform-neutral, and will eliminate the need for them to integrate proprietary database extensions such as ODBC drivers with their applications. This will shorten development cycles substantially, and will make installation and troubleshooting much easier. Of course, not every vendor reads SQL the same way (for example, SQLite does not recognize the
ALTER TABLE command), but it is relatively easy to deal with these differences, especially for common
Something like RSDP will also enable the creation of inexpensive database-hosting services that eliminate the need for locally hosted databases. An RSDP database service would enable developers to build highly distributed systems that talk to remote databases that could be located anywhere on the internet. For example, you could write a simple PHP script that lives on your shared web server that, in turn, talks to an RSDP server elsewhere on the Web. There's no need to install and configure MySQL on your web server; thus, the web applications and back-end databases can be treated as separate projects with fewer dependencies. This opens up a lot of possibilities to create new categories of applications.
RSDP is admittedly a "duct tape" type of solution that many of you may frown upon. I like to compare it to RSS, which was designed to solve a specific problem, content syndication, and did so simply. It has since evolved into a widely used format, and has been adapted to many applications. The Web is full of simple and stupid interfaces that have proven to be effective ways to solve problems. So bear in mind that RSDP admits its simplicity and stupidity up front. I am not advocating its use in situations where it is not appropriate.
The editors who reviewed this article raised a number of objections to this approach, which included: scalability and performance, SQL portability, network overhead, security, support for BLOBs, stored procedure support, and XML inefficiency. These are all valid concerns that I will address one by one.
RSDP has obvious performance limitations because of its dependence on HTTP, XML, and text-encoding of binary information (for example, Base64 encoding).
Database developers obviously value performance for enterprise applications. RSDP is not designed with performance in mind. Its purpose is to provide developers with an easy way to talk to database servers without the need to install drivers or specialized libraries on client machines. Ease of use and installation comes at the expense of performance. The same is true of many tools; interpreted languages are a prominent example. Many developers are happy to trade runtime performance for cross-platform portability.
Most applications do not require massive centralized databases. My project is a case in point. My program will need to store a few hundred documents at any given time. It does not require an especially fast or large database. The same is true of many applications on the Web. A few grow very large, but most do not.
If your application requires a high-performance database, RSDP is probably not a good fit. In that case, build your application around a specific database interface.
SQL is a kind-of, sort-of standard. Each vendor implements it slightly differently. Thus, RSDP cannot provide complete portability. That's OK, because most developers, even for smaller projects, will settle on one or two preferred back ends. The real value of RSDP is in simplifying installation and troubleshooting. By using a simple XML-RPC interface, RSDP eliminates the need to install specialized drivers or libraries, which eliminates one of the main sources of installation problems.
Furthermore, while SQL is not implemented consistently, it is usually close enough that porting SQL statements from one server to another is straightforward, especially for common types of queries. It's not automatic, but it's not a nightmare, especially if the application avoids using vendor-specific commands.
A common objection is that XML over HTTP is wasteful. This is true, but the same can be said of many internet protocols. I would guess that RSDP would use several times as much bandwidth as an efficiently encoded query. Again, that's OK, because it's not mandatory that everybody use this interface. If you're really concerned about network overhead, you can always use the standard tools.
There is one interesting aside from the Voice over IP business (my main background is in telecom). VoIP systems allow users and sysadmins to select from a number of compression algorithms to adjust the amount of bandwidth required for each voice call. The base data rate for an uncompressed phone call is 64kbps in each direction. Some VoIP codecs reduce this to well under 10kbps. What's interesting is that, except in situations where bandwidth is really scarce, most people opt for higher bitrate codecs.
Except for high-performance systems, I think the concerns about network overhead are overstated.
Security is an obvious concern, as an unsecured XML-RPC interface would be an invitation for people to do all sorts of unpleasant things. However, this is really no more of a concern here than it is for CGI scripts (a badly written CGI script can give someone root access to the system shell).
There are a number of things that can be done to secure an RSDP server, including:
ALTER, for the XML-RPC interface.
A good RSDP server implementation will provide the sysadmin with a high degree of control over how the XML-RPC interface is presented to the outside world, or whether to enable it at all.
One of the reviewers pointed out that XML-RPC is not a good way to deliver BLOBs. I was not thinking of BLOBs when I wrote this article (my habit has always been to use databases to point to other resources, and to store large files somewhere else). However, it will be possible to send them using Base64 encoding. It's not very efficient, but it will work. If you need to support BLOBs, RSDP may not be a very good way to do this.
Stored procedures are obviously an important tool for database developers. Although RSDP does not address this directly, most SQL databases provide the ability to create and call stored procedures via the SQL language (for example, using the
CREATE PROCEDURE command), or via separate system administration tools. In most cases, it should be possible to invoke stored procedures via the
It's clear that RSDP is not the fastest or most efficient way to talk to database servers. Depending on your goals, that may or may not be a problem. Most of the projects I work on are small- to medium-sized projects where ease of use is more important than speed. So I can live with a performance hit if it means I can quickly prototype an application and focus most of my time on its features, and not on figuring out why a web script is generating an error when it tries to call a DB API function.
I suspect I am not alone in wanting a better way to talk to databases. After all, if performance were the primary design criteria for software, we'd all be programming in C. Judging by the number of books O'Reilly sells about scripting languages, that's clearly not the case.
As specifications go, RSDP is about as simple as it gets; almost simple enough to squeeze onto a cocktail napkin.
An RSDP server should expect to receive either an XML-RPC (machine-machine) or HTTP/CGI (human-machine) query. The server will listen on two user defined ports, one for HTTP, one for HTTPS.
If the client opens the base URL for the server, the server responds by serving an HTML form that allows the user to interact with the server manually. This is provided so that users can interact with the server manually. It need not be an especially pretty interface, just sufficient so that developers can submit test queries that mirror XML-RPC queries. This will be helpful for troubleshooting purposes.
If the client submits an XML-RPC query, the server should present a single XML-RPC method,
rsdp(), which is invoked as follows:
rdsp(dbname, username, password, querytype, query, strict, outputformat) rsdp("books","brian","password","sql","SELECT * FROM books WHERE author = 'Brian McConnell' ORDER BY pubdate;'",false,"xml") NOTE: this instance of rsdp() invokes a SQL query, but the querytype param allows other query languages such as XQuery to be used as well.
The RSDP server then invokes the query expression, after first validating the user. The server is responsible for implementing security policies (for example, access control lists, IP filtering, etc.) internally.
If the client is interacting with the server via the HTML form interface, the server will present records within tables, perhaps with a hyperlink to display the same recordset in XML.
If the client is interacting with the server via XML-RPC, the server will reply by either returning an error code, or by returning an XML recordset, as shown in the example below (data types are omitted for brevity). If the optional strict parameter is true, the server returns strictly typed XML, using the basic data types defined in the XML spec. If false or omitted, the server returns loosely typed XML and assumes the client application will perform any necessary data type conversions. The
outputformat parameter allows the client to request other formats besides XML, such as CSV, HTML, and so on.
<recordset> <count>X</count> <record> <recid>1234</recid> <firstname>John</firstname> <lastname>Doe</lastname> </record> <record> .... </record> </recordset>
Implementing RSDP should require a minimal effort on the part of database vendors. If vendors are slow to do this, developers can build scripts to provide this functionality in the interim. Since it is trivial to build XML-RPC web services in languages like PHP, developers need not wait for vendors to ship direct support for RSDP and can build their own solutions. In the long run, it will be best if vendors support something like this directly, as they will be able to optimize the interface for performance, as well as to implement security features within the database engine (rather than rely on an external script to intercept malicious SQL commands).
Using RSDP in client-side applications will generally be quite straightforward, since most languages now support XML-RPC.
Developers can further simplify the interface by writing small class libraries or wrapper functions that make the XML-RPC interface look like a conventional database API.
For example, one might create a
rsdp class in Python that exposes a set of methods such as:
rsdp.execute(): Execute SQL query.
rsdp.forward(): Skip forward one record.
rsdp.back(): Skip back one record.
rsdp.first(): Jump to first item in recordset.
rsdp.last(): Jump to last item in recordset.
This approach will provide developers with something that looks and behaves as expected, while also providing an additional layer of idiot-proofing. In fact, the class will just submit a query, store the results, and provide the same type of cursor methods found in existing database APIs. This won't improve performance, but it will make the interface to RSDP consistent with conventional APIs. This will also ease the transition to vendor-specific interfaces if a project outgrows RSDP.
RDSP, or whatever it ends up being called, will not replace current database access protocols, but it will have an important role to play in rapid application development, and in distributed applications.
As I mentioned earlier, databases are easy enough to work with, after you've installed the DB server and necessary middleware to communicate with it. The problem with currently available tools is that it is difficult to build distributable packages that use client/server databases. While it is certainly possible to build packages that automatically install and configure the necessary components, doing so requires a lot of time and effort that could otherwise be spent refining the application itself.
While it will not solve SQL compatibility issues between vendors, an interface like RDSP will eliminate the need for an application to be tightly coupled with database middleware (for example, ODBC drivers in Windows, a MySQL package for Perl, etc). This will save time for developers, who will not have to spend so much time on installation scripts, and for users, who will not have to fight with as many database configuration issues.
RSDP will also enable ASPs to host database services, much as companies currently host websites, blogs, and the like. While it may not scale well enough for enterprise applications, this will certainly be viable for most small- and mid-sized projects. Moreover, if the interface proves popular, I think it is likely that people will find workarounds to the performance issues that are likely to arise (for example, by using HTTP load balancing to spread workload around among the servers answering read-only queries).
What is the next step? RDSP front ends can be built with or without vendor participation. To further this end, I have created a wiki where people can post documentation and pointers to RSDP implementations for various database engines. If this takes off, hopefully vendors will follow up by embedding RDSP-like support in their systems.
Apart from simplifying database access, RSDP has the potential to be the equivalent of RSS on steroids. RSDP calls embedded within web pages could link to many different databases to merge dynamic content into web pages.
For example, imagine a website that covers upcoming events. RSDP would make it easy to display event information. One would simply embed a statement like:
rsdprss("events","guest","password","sql","SELECT * FROM calendar WHERE startdate >= #20050402# AND startdate <= #20050502# AND category = 'movies' ORDER BY startdate;")
The web server queries the database server and imports the recordset as an RSS feed. Thus, people would be able to create websites that pull content in using specific search criteria from many different sources. This is just a simple example of how you might do this. Once you make databases more readily accessible, there are all sorts of potential applications.
Brian McConnell is an inventor, author, and serial telecom entrepreneur. He has founded three telecom startups since moving to California. The most recent, Open Communication Systems, designs cutting-edge telecom applications based on open standards telephony technology.
Return to the O'Reilly Network.
Copyright © 2009 O'Reilly Media, Inc.