This 30-day project explores the refactoring of a legacy system. The Everything Engine is an aging software project that powers Perl Monks, Everything 2, and a few other websites. It suffers from poor design and maintainiability. Learn what it’s like to look over the shoulder of an experienced developer as he refactors, redesigns, and updates the code.
Today’s task is to figure out what to do next — and to plan for reshuffling the database code.
Day 23: Untangling the Database Code
After several days (about an hour apiece) of porting procedural tests to the
OO-style encouraged by Test::Class, I’ve
tripled the number of tests in the system and fixed several bugs (at least one
or two in the SUPER()
module).
My ultimate goal was to fix the database layer. One of the best examples
of the suboptimal design of the system is the repeated code in
NodeBase and node that checks if the user is in a
workspace.
A workspace is a virtual datastore. That is, an administrative user can create a workspace and then change nodemethods or display templates or anything else in the system and see those changes only while in the workspace. Users of the production system will never see those changes, even if the administrator works on the live machine and even if those changes have compilation errors.
Six years later, the idea of doing development on a live machine, even in a workspace, seems suboptimal, but you live and hopefully you learn.
The important point is that the workspace behavior is the same as the
non-workspaced behavior, except that methods such as getNode()
fetch node data from a different place and operations such as saving a node
actually save revisions of a node.
That argues strongly that it should be possible to subclass (or role
decorate) NodeBase with NodeBase::Workspaced to
provide a uniform interface to all of the other code while performing slightly
different actions inside the appropriate objects.
That’s one of my ultimate goals. Of course, it also would be nice to support SQLite for a testing database. That also supports another design goal of being able to run separate development instances easily. (It might obviate some of the need for workspaces, too. I don’t want to remove any features right now though.)
Looking at the code, Everything::NodeBase is an 1876-line file.
There’s a lot of documentation, but there are still 44 methods. (In Vim,
:%s/^sub/&/g and look at the number of substitutions.)
The end of the class has four deprecated methods that do not appear anywhere else in ebase or ecore, so they’re candidates for deletion. ZAP.
My first pass through this code is to see how it uses the database.
new() calls databaseConnect() with the provided
DSN information.
joinWorkspace() joins or leaves a workspace. (Why no
leaveWorkspace()? I’ll fix that later.) Code elsewhere does
call this method, so I’m not sure how to handle it. Maybe when a user wants
to join a workspace, make a note in his or her settings. On the next
request, use the workspaced nodebase instead.
That seems clean. The system already supports the idea of not showing changes on the same request, but the redirect-on-request approach is a nice feature to add.
Nothing outside of NodeBase calls
getNodeWorkspace(), so that can move into the new class.
rebuildNodetypeModules(),
buildNodetypeModules(), and loadNodetypeModule()
avoid workspace code. buildNodetypeModules() accesses the
database, so some of its code has to move somewhere else (or to an
abstracter fetching mechanism).
resetNodeCache() is fine as it is.
I hate getDatabaseHandle(), but it’s a decent abstraction for
now. It would be nice if user code never ran raw SQL queries. Having that
option may be nice, but it shouldn’t be the default. Something like SQL::Abstract or
Jifty::DBI
would be much more useful.
getCache() is fine as it is too.
sqlDelete(), sqlSelect(),
sqlSelectJoined(), sqlSelectMany(),
sqlSelectHashref(), sqlUpdate(),
sqlInsert(), _quote_data(), and
sqlExecute() should move. For now, I will probably leave them to
proxy to the actual database methods.
I want to get rid of getNodeById() also, because the
comment says that it’s a temporary wrapper. However, ecore uses it in a few
places. (It looks like porting those to use getNode()
appropriately will work nicely though. I’ll do that later too.)
This is another part of the design I dislike; overloading
getNode() to take all kinds of different types of arguments.
The argument checking code there is extensive, where self-documenting
method names would make the intent and use of the code clearer.
newNode() is fine. getNode() is mostly fine.
It has some workspace-specific code at the end, but that’s fixable.
getNodeByName() needs documentation and it has too much
SQL-specific code inside. getNodeZero() is fine (it creates a
virtual node). getNodeByIdNew() also has some SQL.
constructNode() has SQL. getNodeWhere() is a
decent abstraction. So is selectNodeWhere(), except its
argument list is awful and it uses too much SQL.
I don’t like getNodeCursor() in theory; whatever this
method becomes should use an iterator instead. In practice, it has a lot of
SQL inside.
countNodeMatches() is a decent abstraction, though it also
uses too much SQL.
getType() is fine. getAllTypes() uses SQL. So
do getFields() and dropNodeTable().
quote() I hate — why not use placeholders?
genWhereString() seems really out of place. At least
getNodetypeTables() is in the private methods section.
Everything else seems sane.
What’s my plan? To move every method that uses SQL or the DBI into a
separate module, probably Everything::Storage::DB. That does imply
the possibility of Everything::Storage::YAML or something
otherwise strange later. That’s fine. I’m not going to do it now and I might
not do it ever, but when a design change allows for greater flexibility without
sacrificing a sane default, I start to think I’ve found a good solution.
My only concern is with the nodebase initialization. Maybe it doesn’t
matter now, with only one type of storage, but what has or should have
responsibility for instantiating the storage module? I’ll leave that in the
NodeBase constructor for now.
Reading through new(), now I realize that I have already
moved some of the initialization code elsewhere, as nothing instantiates
NodeBase directly. Instead, it loads the MySQL or PostgreSQL
subclasses.
Untangling that could be more of a problem. Perhaps the right approach is to
make both modules inherit from Everything::Storage instead, then
change the NodeBase constructor. I’ll make that my goal for day
24.
While skimming NodeBase, I fixed up some POD, deleted some
old deprecated methods, and fixed some flawed UNIVERSAL::isa()
calls. All tests still pass. This is checkin #878.
I have a few moments left, and I just realized that I never turned off
SUPER() in Everything::Node. I bet I can remove that
and AUTOLOAD() now that nodes inherit properly. Will the tests
pass?
One test in t/Node.t fails. It checked for the existence of
SUPER(). Adding use SUPER; there fixes it.
After removing all of the use SUPER; and typeglob
assignments in the node test classes, everything still passes. This is
checkin #879 and it removes over 200 lines of code. I like that.
Tomorrow, the great database reshuffling begins!
