This 30-day project explores the refactoring of a legacy system. The Everything Engine is an aging software project that powers Perl Monks, Everything 2, and a few other websites. It suffers from poor design and maintainiability. Learn what it’s like to look over the shoulder of an experienced developer as he refactors, redesigns, and updates the code.
Today’s task is to figure out what to do next — and to plan for reshuffling the database code.
Day 23: Untangling the Database Code
After several days (about an hour apiece) of porting procedural tests to the
OO-style encouraged by
tripled the number of tests in the system and fixed several bugs (at least one
or two in the
My ultimate goal was to fix the database layer. One of the best examples
of the suboptimal design of the system is the repeated code in
node that checks if the user is in a
A workspace is a virtual datastore. That is, an administrative user can create a workspace and then change nodemethods or display templates or anything else in the system and see those changes only while in the workspace. Users of the production system will never see those changes, even if the administrator works on the live machine and even if those changes have compilation errors.
Six years later, the idea of doing development on a live machine, even in a workspace, seems suboptimal, but you live and hopefully you learn.
The important point is that the workspace behavior is the same as the
non-workspaced behavior, except that methods such as
fetch node data from a different place and operations such as saving a node
actually save revisions of a node.
That argues strongly that it should be possible to subclass (or role
provide a uniform interface to all of the other code while performing slightly
different actions inside the appropriate objects.
That’s one of my ultimate goals. Of course, it also would be nice to support SQLite for a testing database. That also supports another design goal of being able to run separate development instances easily. (It might obviate some of the need for workspaces, too. I don’t want to remove any features right now though.)
Looking at the code,
Everything::NodeBase is an 1876-line file.
There’s a lot of documentation, but there are still 44 methods. (In Vim,
:%s/^sub/&/g and look at the number of substitutions.)
The end of the class has four deprecated methods that do not appear anywhere else in ebase or ecore, so they’re candidates for deletion. ZAP.
My first pass through this code is to see how it uses the database.
databaseConnect() with the provided
joinWorkspace() joins or leaves a workspace. (Why no
leaveWorkspace()? I’ll fix that later.) Code elsewhere does
call this method, so I’m not sure how to handle it. Maybe when a user wants
to join a workspace, make a note in his or her settings. On the next
request, use the workspaced nodebase instead.
That seems clean. The system already supports the idea of not showing changes on the same request, but the redirect-on-request approach is a nice feature to add.
Nothing outside of
getNodeWorkspace(), so that can move into the new class.
avoid workspace code.
buildNodetypeModules() accesses the
database, so some of its code has to move somewhere else (or to an
abstracter fetching mechanism).
resetNodeCache() is fine as it is.
getDatabaseHandle(), but it’s a decent abstraction for
now. It would be nice if user code never ran raw SQL queries. Having that
option may be nice, but it shouldn’t be the default. Something like
would be much more useful.
getCache() is fine as it is too.
sqlExecute() should move. For now, I will probably leave them to
proxy to the actual database methods.
I want to get rid of
getNodeById() also, because the
comment says that it’s a temporary wrapper. However, ecore uses it in a few
places. (It looks like porting those to use
appropriately will work nicely though. I’ll do that later too.)
This is another part of the design I dislike; overloading
getNode() to take all kinds of different types of arguments.
The argument checking code there is extensive, where self-documenting
method names would make the intent and use of the code clearer.
newNode() is fine.
getNode() is mostly fine.
It has some workspace-specific code at the end, but that’s fixable.
getNodeByName() needs documentation and it has too much
SQL-specific code inside.
getNodeZero() is fine (it creates a
getNodeByIdNew() also has some SQL.
constructNode() has SQL.
getNodeWhere() is a
decent abstraction. So is
selectNodeWhere(), except its
argument list is awful and it uses too much SQL.
I don’t like
getNodeCursor() in theory; whatever this
method becomes should use an iterator instead. In practice, it has a lot of
countNodeMatches() is a decent abstraction, though it also
uses too much SQL.
getType() is fine.
getAllTypes() uses SQL. So
quote() I hate — why not use placeholders?
genWhereString() seems really out of place. At least
getNodetypeTables() is in the private methods section.
Everything else seems sane.
What’s my plan? To move every method that uses SQL or the DBI into a
separate module, probably
Everything::Storage::DB. That does imply
the possibility of
Everything::Storage::YAML or something
otherwise strange later. That’s fine. I’m not going to do it now and I might
not do it ever, but when a design change allows for greater flexibility without
sacrificing a sane default, I start to think I’ve found a good solution.
My only concern is with the nodebase initialization. Maybe it doesn’t
matter now, with only one type of storage, but what has or should have
responsibility for instantiating the storage module? I’ll leave that in the
NodeBase constructor for now.
new(), now I realize that I have already
moved some of the initialization code elsewhere, as nothing instantiates
NodeBase directly. Instead, it loads the MySQL or PostgreSQL
Untangling that could be more of a problem. Perhaps the right approach is to
make both modules inherit from
Everything::Storage instead, then
NodeBase constructor. I’ll make that my goal for day
NodeBase, I fixed up some POD, deleted some
old deprecated methods, and fixed some flawed
calls. All tests still pass. This is checkin #878.
I have a few moments left, and I just realized that I never turned off
Everything::Node. I bet I can remove that
AUTOLOAD() now that nodes inherit properly. Will the tests
One test in t/Node.t fails. It checked for the existence of
use SUPER; there fixes it.
After removing all of the
use SUPER; and typeglob
assignments in the node test classes, everything still passes. This is
checkin #879 and it removes over 200 lines of code. I like that.
Tomorrow, the great database reshuffling begins!