advertisement

Print

RSDP: A Really Simple Proposal

by Brian McConnell, author of Beyond Contact
04/11/2005

We depend on databases for everything, and nearly every useful application talks to some sort of database. Anybody who has written software knows that communicating with databases can be one of the most nettlesome tasks; not because databases are inherently difficult to work with, but because they are difficult to install and configure.

Case in point: I am working on a Python application that requires a smallish back-end database to store documents and user information. My design objective for this project is to make installation as simple as possible, with no dependencies upon external packages or extensions (that is, pure Python).

I thought this would be easy enough, as Python is a great language to work with. I quickly discovered that getting it to work with external databases apart from Berkeley DB (via anydbm) is not so simple. Once you get a particular package, such as PySQLite, installed and configured, it's easy enough. However, try explaining to an end user how to install external packages if they are not already familiar with Python (most aren't). I've spent far too much time fighting with database issues, versus building and debugging my application.

In theory, Python, like most other languages, provides a standard database API. This specification provides high-level access to common database functions such as paging through a recordset. The problem is that each database requires a different extension to Python. Some of these extensions, like PySQLite, are relatively easy to install. Others are a nightmare. This is true of any language, as each database engine has a different installation procedure, as well as a large number of configuration options, any one of which can break a client/server application.

What I don't understand is why, in 2005, we are still using proprietary and unnecessarily complicated wire protocols to talk to database servers when we could be using something much simpler, something akin to RSS. The vast majority of database applications perform a fairly small set of tasks, such as submitting a SQL query, paging through a recordset, and adding a record. Why do we need something like ODBC to do this? The answer is we don't, for many applications.

Most of what we use databases for, we can accomplish with XML-RPC and SQL. So why not define a simple XML-RPC interface (Really Simple Database Protocol, or RSDP), that allows applications to communicate with SQL databases without the need for troublesome external packages and middleware? People are already doing this with in-house projects, so all that is needed is for a couple of vendors to build simple XML-RPC front ends to existing systems.

RSDP, or whatever you want to call it, would not replace existing DB interfaces, nor would it be appropriate for every situation (I'll detail the limitations of this approach later in the article). It would provide developers with a way to prototype and build database driven-applications more independently of back-end systems. The software industry is full of examples of simple "duct tape" solutions, such as RSS, that have become de facto standards despite their limitations.

What might this XML-RPC interface look like? Just one XML-RPC method will do the trick for a basic implementation. Try this on for size:


rsdp(dbname,username,pw,sql)

rsdp("customers.sql","foo","bar","SELECT * FROM customers WHERE 
  lastlogin >= '20041231' ORDER BY lastname;")

Related Reading

Programming Web Services with XML-RPC
By Simon St. Laurent, Joe Johnston, Edd Dumbill

The client application simply invokes this method on the database server, which checks that the user has permission to perform this operation, opens the desired database, performs the SQL query, and responds with an error code or XML recordset. That's all there is to it, and this will be easy to implement in any XML-RPC-aware programming language, which is to say pretty much all of them.

This is obviously a simplified example, but need not be expanded upon very much to provide most of the functionality that is required for most applications. Simply providing a way to submit SQL queries via XML-RPC will enable developers to use whatever SQL features are recognized by the database server, including JOINs, LOCKs, etc.

An interface like RSDP will allow developers to write database applications that are fairly platform-neutral, and will eliminate the need for them to integrate proprietary database extensions such as ODBC drivers with their applications. This will shorten development cycles substantially, and will make installation and troubleshooting much easier. Of course, not every vendor reads SQL the same way (for example, SQLite does not recognize the ALTER TABLE command), but it is relatively easy to deal with these differences, especially for common SELECT and UPDATE operations.

Something like RSDP will also enable the creation of inexpensive database-hosting services that eliminate the need for locally hosted databases. An RSDP database service would enable developers to build highly distributed systems that talk to remote databases that could be located anywhere on the internet. For example, you could write a simple PHP script that lives on your shared web server that, in turn, talks to an RSDP server elsewhere on the Web. There's no need to install and configure MySQL on your web server; thus, the web applications and back-end databases can be treated as separate projects with fewer dependencies. This opens up a lot of possibilities to create new categories of applications.

RSDP's Limitations

RSDP is admittedly a "duct tape" type of solution that many of you may frown upon. I like to compare it to RSS, which was designed to solve a specific problem, content syndication, and did so simply. It has since evolved into a widely used format, and has been adapted to many applications. The Web is full of simple and stupid interfaces that have proven to be effective ways to solve problems. So bear in mind that RSDP admits its simplicity and stupidity up front. I am not advocating its use in situations where it is not appropriate.

The editors who reviewed this article raised a number of objections to this approach, which included: scalability and performance, SQL portability, network overhead, security, support for BLOBs, stored procedure support, and XML inefficiency. These are all valid concerns that I will address one by one.

Scalability and Performance

RSDP has obvious performance limitations because of its dependence on HTTP, XML, and text-encoding of binary information (for example, Base64 encoding).

Database developers obviously value performance for enterprise applications. RSDP is not designed with performance in mind. Its purpose is to provide developers with an easy way to talk to database servers without the need to install drivers or specialized libraries on client machines. Ease of use and installation comes at the expense of performance. The same is true of many tools; interpreted languages are a prominent example. Many developers are happy to trade runtime performance for cross-platform portability.

Most applications do not require massive centralized databases. My project is a case in point. My program will need to store a few hundred documents at any given time. It does not require an especially fast or large database. The same is true of many applications on the Web. A few grow very large, but most do not.

If your application requires a high-performance database, RSDP is probably not a good fit. In that case, build your application around a specific database interface.

Portability

SQL is a kind-of, sort-of standard. Each vendor implements it slightly differently. Thus, RSDP cannot provide complete portability. That's OK, because most developers, even for smaller projects, will settle on one or two preferred back ends. The real value of RSDP is in simplifying installation and troubleshooting. By using a simple XML-RPC interface, RSDP eliminates the need to install specialized drivers or libraries, which eliminates one of the main sources of installation problems.

Furthermore, while SQL is not implemented consistently, it is usually close enough that porting SQL statements from one server to another is straightforward, especially for common types of queries. It's not automatic, but it's not a nightmare, especially if the application avoids using vendor-specific commands.

Network Overhead

A common objection is that XML over HTTP is wasteful. This is true, but the same can be said of many internet protocols. I would guess that RSDP would use several times as much bandwidth as an efficiently encoded query. Again, that's OK, because it's not mandatory that everybody use this interface. If you're really concerned about network overhead, you can always use the standard tools.

There is one interesting aside from the Voice over IP business (my main background is in telecom). VoIP systems allow users and sysadmins to select from a number of compression algorithms to adjust the amount of bandwidth required for each voice call. The base data rate for an uncompressed phone call is 64kbps in each direction. Some VoIP codecs reduce this to well under 10kbps. What's interesting is that, except in situations where bandwidth is really scarce, most people opt for higher bitrate codecs.

Except for high-performance systems, I think the concerns about network overhead are overstated.

Security

Security is an obvious concern, as an unsecured XML-RPC interface would be an invitation for people to do all sorts of unpleasant things. However, this is really no more of a concern here than it is for CGI scripts (a badly written CGI script can give someone root access to the system shell).

There are a number of things that can be done to secure an RSDP server, including:

  • Implement access control lists with different privileges for different users.
  • Disallow certain classes of operations, such as ALTER, for the XML-RPC interface.
  • Limit access to certain IP addresses or to a VPN.
  • Allow only HTTPS queries.
  • Disable the XML-RPC interface by default, unless the server administrator explicitly enables it.
  • Front end RSDP servers with a firewall.

A good RSDP server implementation will provide the sysadmin with a high degree of control over how the XML-RPC interface is presented to the outside world, or whether to enable it at all.

BLOBs

One of the reviewers pointed out that XML-RPC is not a good way to deliver BLOBs. I was not thinking of BLOBs when I wrote this article (my habit has always been to use databases to point to other resources, and to store large files somewhere else). However, it will be possible to send them using Base64 encoding. It's not very efficient, but it will work. If you need to support BLOBs, RSDP may not be a very good way to do this.

Stored Procedures

Stored procedures are obviously an important tool for database developers. Although RSDP does not address this directly, most SQL databases provide the ability to create and call stored procedures via the SQL language (for example, using the CREATE PROCEDURE command), or via separate system administration tools. In most cases, it should be possible to invoke stored procedures via the CALL command.

It Could Be Worse

It's clear that RSDP is not the fastest or most efficient way to talk to database servers. Depending on your goals, that may or may not be a problem. Most of the projects I work on are small- to medium-sized projects where ease of use is more important than speed. So I can live with a performance hit if it means I can quickly prototype an application and focus most of my time on its features, and not on figuring out why a web script is generating an error when it tries to call a DB API function.

I suspect I am not alone in wanting a better way to talk to databases. After all, if performance were the primary design criteria for software, we'd all be programming in C. Judging by the number of books O'Reilly sells about scripting languages, that's clearly not the case.

Pages: 1, 2

Next Pagearrow