This 30-day project explores the refactoring of a legacy system. The Everything Engine is an aging software project that powers Perl Monks, Everything 2, and a few other websites. It suffers from poor design and maintainiability. Learn what it’s like to look over the shoulder of an experienced developer as he refactors, redesigns, and updates the code.

Today’s task is to figure out what to do next — and to plan for reshuffling the database code.

Day 23: Untangling the Database Code

After several days (about an hour apiece) of porting procedural tests to the OO-style encouraged by Test::Class, I’ve tripled the number of tests in the system and fixed several bugs (at least one or two in the SUPER() module).

My ultimate goal was to fix the database layer. One of the best examples of the suboptimal design of the system is the repeated code in NodeBase and node that checks if the user is in a workspace.

A workspace is a virtual datastore. That is, an administrative user can create a workspace and then change nodemethods or display templates or anything else in the system and see those changes only while in the workspace. Users of the production system will never see those changes, even if the administrator works on the live machine and even if those changes have compilation errors.

Six years later, the idea of doing development on a live machine, even in a workspace, seems suboptimal, but you live and hopefully you learn.

The important point is that the workspace behavior is the same as the non-workspaced behavior, except that methods such as getNode() fetch node data from a different place and operations such as saving a node actually save revisions of a node.

That argues strongly that it should be possible to subclass (or role decorate) NodeBase with NodeBase::Workspaced to provide a uniform interface to all of the other code while performing slightly different actions inside the appropriate objects.

That’s one of my ultimate goals. Of course, it also would be nice to support SQLite for a testing database. That also supports another design goal of being able to run separate development instances easily. (It might obviate some of the need for workspaces, too. I don’t want to remove any features right now though.)

Looking at the code, Everything::NodeBase is an 1876-line file. There’s a lot of documentation, but there are still 44 methods. (In Vim, :%s/^sub/&/g and look at the number of substitutions.)

The end of the class has four deprecated methods that do not appear anywhere else in ebase or ecore, so they’re candidates for deletion. ZAP.

My first pass through this code is to see how it uses the database. new() calls databaseConnect() with the provided DSN information.

joinWorkspace() joins or leaves a workspace. (Why no leaveWorkspace()? I’ll fix that later.) Code elsewhere does call this method, so I’m not sure how to handle it. Maybe when a user wants to join a workspace, make a note in his or her settings. On the next request, use the workspaced nodebase instead.

That seems clean. The system already supports the idea of not showing changes on the same request, but the redirect-on-request approach is a nice feature to add.

Nothing outside of NodeBase calls getNodeWorkspace(), so that can move into the new class.

rebuildNodetypeModules(), buildNodetypeModules(), and loadNodetypeModule() avoid workspace code. buildNodetypeModules() accesses the database, so some of its code has to move somewhere else (or to an abstracter fetching mechanism).

resetNodeCache() is fine as it is.

I hate getDatabaseHandle(), but it’s a decent abstraction for now. It would be nice if user code never ran raw SQL queries. Having that option may be nice, but it shouldn’t be the default. Something like SQL::Abstract or Jifty::DBI would be much more useful.

getCache() is fine as it is too.

sqlDelete(), sqlSelect(), sqlSelectJoined(), sqlSelectMany(), sqlSelectHashref(), sqlUpdate(), sqlInsert(), _quote_data(), and sqlExecute() should move. For now, I will probably leave them to proxy to the actual database methods.

I want to get rid of getNodeById() also, because the comment says that it’s a temporary wrapper. However, ecore uses it in a few places. (It looks like porting those to use getNode() appropriately will work nicely though. I’ll do that later too.)

This is another part of the design I dislike; overloading getNode() to take all kinds of different types of arguments. The argument checking code there is extensive, where self-documenting method names would make the intent and use of the code clearer.

newNode() is fine. getNode() is mostly fine. It has some workspace-specific code at the end, but that’s fixable. getNodeByName() needs documentation and it has too much SQL-specific code inside. getNodeZero() is fine (it creates a virtual node). getNodeByIdNew() also has some SQL. constructNode() has SQL. getNodeWhere() is a decent abstraction. So is selectNodeWhere(), except its argument list is awful and it uses too much SQL.

I don’t like getNodeCursor() in theory; whatever this method becomes should use an iterator instead. In practice, it has a lot of SQL inside.

countNodeMatches() is a decent abstraction, though it also uses too much SQL.

getType() is fine. getAllTypes() uses SQL. So do getFields() and dropNodeTable(). quote() I hate — why not use placeholders?

genWhereString() seems really out of place. At least getNodetypeTables() is in the private methods section.

Everything else seems sane.

What’s my plan? To move every method that uses SQL or the DBI into a separate module, probably Everything::Storage::DB. That does imply the possibility of Everything::Storage::YAML or something otherwise strange later. That’s fine. I’m not going to do it now and I might not do it ever, but when a design change allows for greater flexibility without sacrificing a sane default, I start to think I’ve found a good solution.

My only concern is with the nodebase initialization. Maybe it doesn’t matter now, with only one type of storage, but what has or should have responsibility for instantiating the storage module? I’ll leave that in the NodeBase constructor for now.

Reading through new(), now I realize that I have already moved some of the initialization code elsewhere, as nothing instantiates NodeBase directly. Instead, it loads the MySQL or PostgreSQL subclasses.

Untangling that could be more of a problem. Perhaps the right approach is to make both modules inherit from Everything::Storage instead, then change the NodeBase constructor. I’ll make that my goal for day 24.

While skimming NodeBase, I fixed up some POD, deleted some old deprecated methods, and fixed some flawed UNIVERSAL::isa() calls. All tests still pass. This is checkin #878.

I have a few moments left, and I just realized that I never turned off SUPER() in Everything::Node. I bet I can remove that and AUTOLOAD() now that nodes inherit properly. Will the tests pass?

One test in t/Node.t fails. It checked for the existence of SUPER(). Adding use SUPER; there fixes it.

After removing all of the use SUPER; and typeglob assignments in the node test classes, everything still passes. This is checkin #879 and it removes over 200 lines of code. I like that.

Tomorrow, the great database reshuffling begins!